FilterHN

10 months ago

[-]

> the longest post on my site, takes 92 KiB instead of 37 KiB. This amounts to an unnecessary 2.5x increase in load time

Sure, if you ignore latency. In reality it's an unnecessary 0.001% increase in load time because that size increase isn't enough to matter vs the round trip time. And the time you save transmitting 55 fewer KiB is probably less than the time lost to decompression. :p

While fun, I would expect this specific scenario to actually be worse for the user experience not better. Speed will be a complete wash and compatibility will be worse.

10 months ago

[-]

That's certainly reasonable if you optimize only for loading time (and make certain assumptions about everybody's available data rate), but sometimes I really wish website (and more commonly app) authors wouldn't make that speed/data tradeoff so freely on my behalf, for me to find out after they've already pushed that extra data over my metered connection.

The tragedy here is that while some people, such as the author of TFA, go to great lengths to get from about 100 to 50 kB, others don't think twice to send me literally tens of megabytes of images, when I just want to know when a restaurant is open – on roaming data.

Resource awareness exists, but it's unfortunately very unevenly distributed.

10 months ago

[-]

There is an interesting "Save-Data" header to let a site know which makes sense to optimize for on connection but it seems to be Chrome only so far https://caniuse.com/?search=save-data

I wish there was a bit of an opposite option - a "don't lazy/partially load anything" for those of us on fiber watching images pop up as we scroll past them in the page that's been open for a minute.

kelnos

10 months ago

[-]

> ... on roaming data.

A little OT, and I'm not sure if iOS has this ability, but I found that while I'm traveling, if I enable Data Saver on my (Android) phone, I can easily go a couple weeks using under 500MB of cellular data. (I also hop onto public wifi whenever it's available, so being in a place with lots of that is helpful.)

My partner, who has an iPhone, and couldn't find an option like that (maybe it exists; I don't think she tried very hard to find it), blew through her 5GB of free high-speed roaming data (T-Mobile; after that you get 256kbps, essentially unusable) in 5 or 6 days on that same trip.

It turns out there's so much crap going on the background, and it's all so unnecessary for the general user experience. And I bet it saves battery too. Anything that uses Google's push notifications system still works fine and gets timely notifications, as IIRC that connection is exempt from the data-saving feature.

I've thought about leaving Data Saver on all the time, even when on my home cellular network. Should probably try it and see how it goes.

But overall, yes, it would be great if website designers didn't design as if everyone is on an unmetered gigabit link with 5ms latency...

YourOrdinaryCat

10 months ago

[-]

Settings > Cellular > Cellular Data Options > Low Data Mode is the iOS equivalent.

10 months ago

[-]

I'm always using that on roaming data! Even then, Google Maps happily blows through hundreds of megabytes of photos within a couple of minutes of browsing for restaurants, probably for all the high-res photos in the reviews...

hahamaster

10 months ago

[-]

On iPhone you can set Low Data mode for all cellular data (and also for specific wifi networks).

She might have kept tapping Sync at the bottom of Photos even though iOS itself pauses it when in Low Data mode. iCloud Photos and video syncing is a data killer if you're on a holiday abroad, my wife takes hundreds of photos and videos a day, imagine what it does to data.

k__

10 months ago

[-]

We need a MoSh-based browser with gopher support.

nox101

10 months ago

[-]

and you'd need AI to tell you whats in the pictures because lots of restaurant sites just have photos of their menu and some designer with no web knowledge put their phone number, address, and hours in an image designed in Photoshop

tacone

10 months ago

[-]

You're in for bad luck. Some time ago I tried some photos of pasta dishes with Gemini and it could not guess the recipe name.

mintplant

10 months ago

[-]

I've found Gemini to be pretty terrible at vision tasks compared to the competition. Try GPT-4o or Claude-3.5-Sonnet instead.

10 months ago

[-]

I do wish site authors wouldn't waste my bandwidth, but I wish much more they wouldn't waste my CPU and break usability in various ways like TFA's "solution" does. This is not a win even if you care deeply about optimizing sites.

oefrha

10 months ago

[-]

It’s not just decompression time. They need to download the whole thing before decompression, whereas the browser can decompress and render HTML as it’s streamed from the server. If the connection is interrupted you lose everything, instead of being able to read the part you’ve downloaded.

So, for any reasonable connection the difference doesn’t matter; for actually gruesomely slow/unreliable connections where 50KB matters this is markedly worse. While a fun experiment, please don’t do it on your site.

robocat

10 months ago

[-]

Other major issues that I had to contend with:

1: browsers choose when to download files and run JavaScript. It is not as easy as one might think to force JavaScript to run immediately as high priority (which it needs to be when it is on critical path to painting).

2: you lose certain browser optimisations where normally many things are done in parallel. Instead you are introducing delays into critical path and those delays might not be worth the "gain".

3: Browsers do great things to start requesting files in parallel as files are detected with HTML/CSS. Removing that feature can be a poor tradeoff.

There are a few other unobvious downsides. I would never deploy anything like that to a production site without serious engineering effort to measure the costs and benefits.

edflsafoiewq

10 months ago

[-]

Well, that, and there's an 850K Symbols-2048-em%20Nerd%20Font%20Complete.woff2 file that sort of drowns out the difference, at least if it's not in cache.

10 months ago

[-]

Now I got curious, and there's also a 400 kB CSS file to go with it: https://purplesyringa.moe/fonts/webfont.css

I'm not up to date on web/font development – does anybody know what that does?

bobbylarrybobby

10 months ago

[-]

It adds unicode characters before elements with the given class. Then it's up to the font to display those Unicode characters — in this case, based on the class names, one can infer that the font assigns an icon to each character used.

10 months ago

[-]

That makes sense, thank you!

So the purpose is effectively to have human-readable CSS class names to refer to given glyphs in the font, rather than having stray private use Unicode characters in the HTML?

lobsterthief

10 months ago

[-]

Yep

This is a reasonable approach if you have a large number of icons across large parts of the site, but you should always compile the CSS/icon set down to only those used.

If only a few icons, and the icons are small, then inlining the SVG is a better option. But if you have too many SVGs directly embedded on the site, the page size itself will suffer.

As always with website optimization, whether something is a good option always “depends”.

alpaca128

10 months ago

[-]

Another detail is that this feature breaks and makes some sites nearly unusable if the browser is set to ignore a website's custom fonts.

10 months ago

[-]

I think at least some recent tools will produce ligatures to turn a plain text into an icon to avoid this issue.

yencabulator

10 months ago

[-]

More reasonable than this class+CSS would be e.g. a React/static-website-template/etc custom element that outputs the correct glyph. The output doesn't need to contain this indirection, and all of the possibilities.

est

10 months ago

[-]

seriously, why can't modern browsers turn off features like remote fonts, webrtc, etc. in settings. I hate when reading a bit then the font changes. Not to say fingerprinting risks.

10 months ago

[-]

You can, and then when someone uses an icon font instead of graphics their page breaks.

est

10 months ago

[-]

I believe many Web APIs can be enabled/disabled on a website basis, no?

Dalewyn

10 months ago

[-]

Skill issue.

Pictures are pictures, text is text. <img> tag exists for a reason.

lenkite

10 months ago

[-]

Its a convenience packaging issue. An icon font is simply more convenient to handle. <img> tags for a hundred images requires more work.

Icon fonts are used all over the place - look at the terminal nowadays. Most TUI's require an icon font to be installed.

Dalewyn

10 months ago

[-]

><img> tags for a hundred images requires more work.

So it's a skill issue.

collinmanderson

10 months ago

[-]

iPhone lock down mode turns off remote fonts (including breaking font icons as sibling says)

Aloisius

10 months ago

[-]

Even the gzipped page itself could be shrunk by 40-50% by optimizing the inline SVGs.

Just removing the inline duplicated SVGs used for light/dark mode and using styles instead would bring it down 30%. Replacing the paths used to write text (!) with <text> and making most of the overly complicated paths into simple rect patterns would take care of the rest.

marcellus23

10 months ago

[-]

Wow, yeah. That kind of discredits the blog author a bit.

10 months ago

[-]

It's cached. Not ideal, sure, and I'll get rid of that bloat someday, but that file is not mission-critical and the cost is amortized between visits. I admit my fault though.

marcellus23

10 months ago

[-]

Definitely fair and I was a bit harsh. It just seemed a bit nonsensical to go through such efforts to get rid of a few kilobytes while serving a massive font file. But I also understand that it's mostly for fun :)

10 months ago

[-]

> It's cached.

For who? Not for 99.9% of the people clicking the link here on HN. Even regular visitors of a blog will likely no longer have things in cache for their next visit.

Modified3019

10 months ago

[-]

As usual, the quickest way to optimize a webpage is to show HN how you tried to do so lol

Definitely a fascinating post though, there were things I’ve not encountered before.

edflsafoiewq

10 months ago

[-]

I mean, it's all just for fun of course.

10 months ago

[-]

That size difference is large enough to make a difference in the number of round trips required (should be roughly one fewer roundtrip with any sensible modern value for the initial congestion window).

Won't be a 2.5x difference, but also not 0.001%.

10 months ago

[-]

You don't need a new roundtrip for every packet. That would be devastating for throughput. One vs two vs three file packets get acked as a batch either way, not serially.

Also when you get to the end, you then see

> The actual savings here are moderate: the original is 88 KiB with gzip, and the WebP one is 83 KiB with gzip. In contrast, Brotli would provide 69 KiB.

At 69 KiB you're still over the default TCP packet max, which means both cases transmit the same number of packets, one just has a bunch of extra overhead added for the extra JavaScript fetch, load, and execute.

The time saved here is going to be negligible at best anyway, but there looks to be actually negative because we're burning time without reducing the number of needed packets at all.

10 months ago

[-]

Those numbers are for a different page. For the original page, the article quotes 44 kB with this method vs. 92 kB for gzip.

> At 69 KiB you're still over the default TCP packet max, which means both cases transmit the same number of packets,

What? No, they absolutely don't transmit the same number of packets. Did you mean some other word?

mananaysiempre

10 months ago

[-]

I expect what GP meant is the default TCP window size, so in a situation where bandwidth costs are dwarfed by roundtrip costs, these two cases will end up taking essentially the same time, because they will incur the same number of ACK roundtrips. Don’t know if the numbers work out, but they at least sound plausible.

10 months ago

[-]

No, there is no way the numbers would work out to the same number of roundtrips. The sizes are different by a factor of 2.5x, and the congestion window will only double in a single roundtrip. The only way the number of roundtrips would be the same is if both transfers fit in the initial congestion window.

10 months ago

[-]

Yes, sorry

codetrotter

10 months ago

[-]

They were probably thinking of the max size for packets in TCP, which is 64K (65535 bytes).

However, Ethernet has a MTU (Maximum Transmission Unit) of 1500 bytes. Unless jumbo frames are used.

And so I agree with you, the number of packets that will be sent for 69 KiB vs 92 KiB will likely be different.

pierrec

10 months ago

[-]

Interesting, how does that add a round trip? For the record here's what I believe to be the common definition of an additional "round trip", in a web development context:

  - client requests X
  - client gets X, which contains a reference to Y
  - therefore client requests Y

So you're starting a new request that depends on the client having received the first one. (although upon closer inspection I think the technique described in the blog post manages to fit everything into the first response, so I'm not sure how relevant this is)

10 months ago

[-]

Unless a resource is very small, it won't be transmitted in a single atomic unit. The sender will only send a part of it, wait the client to acknowledge having received them, and only then send more. That requires a network roundtrip. The larger the resource, the more network roundtrips will be required.

If you want to learn more, pretty much any resource on TCP should explain this stuff. Here's something I wrote years ago, the background section should be pretty applicable: https://www.snellman.net/blog/archive/2017-08-19-slow-ps4-do...

crote

10 months ago

[-]

In reality it's more like:

  - client requests X
  - server sends bytes 0-2k of X
  - client acknowledges bytes 0-2k of X
  - server sends bytes 2k-6k of X
  - client acknowledges bytes 2k-6k of X
  - server sends bytes 6k-14k of X
  - client acknowledges bytes 6k-14k of X
  - server sends bytes 14k-30k of X
  - client acknowledges bytes 14k-30k of X
  - server sends bytes 30k-62k of X
  - client acknowledges bytes 30k-62k of X
  - server sends bytes 62k-83k of X
  - client acknowledges bytes 62k-83k of X
  - client has received X, which contains a reference to Y
  - therefore client requests Y

It's all about TCP congestion control here. There are dozens of algorithms used to handle it, but in pretty much all cases you want to have some kind of slow buildup in order to avoid completely swamping a slower connection and having all but the first few of your packets getting dropped.

notpushkin

10 months ago

[-]

> client acknowledges bytes 0-2k of X

Doesn’t client see reference to Y at this point? Modern browsers start parsing HTML even before they receive the whole document.

Timwi

10 months ago

[-]

Not just modern. This was even more significant on slow connections, so they've kind of always done that. One could even argue that HTML, HTTP (specifically, chunked encoding) and gzip are all intentionally designed to enable this.

10 months ago

[-]

Chunked encoding is required for the server to start sending the document before it has been fully generated (or the size becomes known) but nothing stops the client from parsing not yet complete documents without it.

sleepydog

10 months ago

[-]

Seriously, if you're saving less than a TCP receive window's worth of space it's not going to make any difference to latency.

I suppose it could make a difference on lossy networks, but I'm not sure.

lelandfe

10 months ago

[-]

If the blob contains requests (images, but also stylesheets, JS, or worst case fonts), it will actually instead be a net negative to latency. The browser's preload scanner begins fetching resources even before the HTML is finished being parsed. That can't happen if the HTML doesn't exist until after JS decodes it. In other words, the entire body has become a blocking resource.

These are similar conversations people have around hydration, by the by.

rrr_oh_man

10 months ago

[-]

> These are similar conversations people have around hydration

For the uninitiated: https://en.m.wikipedia.org/wiki/Hydration_(web_development)

10 months ago

[-]

Why is there more latency?

Edit: Ah, I see OP's code requests the webp separately. You can avoid the extra request if you write a self-extracting html/webp polyglot file, as is typically done in the demoscene.

10 months ago

[-]

It takes more time for your message to get back and forth between your computer and the server than it takes for the server to pump out some extra bits.

Even if you transmit the js stuff inline, the op's notion of time still just ignores the fact that it takes the caller time to even ask the server for the data in the first place, and at such small sizes that time swallows the time to transmit from the user's perspective.

10 months ago

[-]

Here's a demo that only uses a single request for the whole page load: https://retr0.id/stuff/bee_movie.webp.html

It is technically 2 requests, but the second one is a cache hit, in my testing.

10 months ago

[-]

That's fine, but if you're evaluating the amount of time it takes to load a webpage, you cannot ignore the time it takes for the client request to reach your server in the first place or for the client to then unpack the data. The time saved transmitting such a small number of bits will be a fraction of the time spent making that initial request anyway. That's all I'm saying.

OP is only looking at transmit size differences, which is both not the same as transmit time differences and also not what the user actually experiences when requesting the page.

10 months ago

[-]

Hmm? I'm not sure where you're taking that from. The webp is inlined.

10 months ago

[-]

Ah, so it is! I was skim-reading and stopped at `const result = await fetch("compressor/compressed.webp");`

zahlman

10 months ago

[-]

Actually, I was rather wondering about that claim, because it seems accidentally cherry-picked. Regarding that post:

> This code minifies to about 550 bytes. Together with the WebP itself, this amounts to 44 KiB. In comparison, gzip was 92 KiB, and Brotli would be 37 KiB.

But regarding the current one:

> The actual savings here are moderate: the original is 88 KiB with gzip, and the WebP one is 83 KiB with gzip. In contrast, Brotli would provide 69 KiB. Better than nothing, though.

Most of the other examples don't show dramatic (like more than factor-of-2) differences between the compression methods either. In my own local testing (on Python wheel data, which should be mostly Python source code, thus text that's full of common identifiers and keywords) I find that XZ typically outperforms gzip by about 25%, while Brotli doesn't do any better than XZ.

10 months ago

[-]

XZ was never considered to become a compression algorithm built into web browsers to start with. Brotli decoder is already there for HTTP, so it has been proposed to include the full Brotli decoder and decoder API as it shouldn't take too much effort to add an encoder and expose them.

Also, XZ (or LZMA/LZMA2 in general) produces a smaller compressed data than Brotli with lots of free time, but is much slower than Brotli when targetting the same compression ratio. This is because LZMA/LZMA2 uses an adaptive range coder and multiple code distribution contexts, both highly contribute to the slowness when higher compression ratios are requested. Brotli only has the latter and its coding is just a bitwise Huffman coder.

hot_gril

10 months ago

[-]

I wish ISPs would advertise latency instead of just bandwidth. It matters a lot for average users, especially now that bandwidth is generally plentiful.

jgalt212

10 months ago

[-]

I have similar feelings on js minification especially if you're sending via gzip.

fsndz

10 months ago

[-]

exactly what I thought too

gkbrk

10 months ago

[-]

> Why readPixels is not subject to anti-fingerprinting is beyond me. It does not sprinkle hardly visible typos all over the page, so that works for me.

> keep the styling and the top of the page (about 8 KiB uncompressed) in the gzipped HTML and only compress the content below the viewport with WebP

Ah, that explains why the article suddenly cut off after a random sentence, with an empty page that follows. I'm using LibreWolf which disables WebGL, and I use Chromium for random web games that need WebGL. The article worked just fine with WebGL enabled, neat technique to be honest.

niutech

10 months ago

[-]

It isn't neat as long as it doesn't work with all modern web browsers (even with fingerprinting protection) and doesn't have a fallback for older browsers. WWW should be universally accessible and progressively enhanced, starting with plain HTML.

afavour

10 months ago

[-]

It isn’t a serious proposal. It’s a creative hack that no one, author included is suggesting should be used in production.

10 months ago

[-]

The author is using the hack in "production".

afavour

10 months ago

[-]

Debatable. I consider a personal blog to be fair game for experimentation, it’s not a paid for project and has no customer.

kjhcvkek77

10 months ago

[-]

This philosophy hands your content on a silver platter to ai companies, so they can rake in money while giving nothing back to the author.

latexr

10 months ago

[-]

I don’t support LLM companies stealing content and profiting from it without contributing back. But if you’re going to fight that by making things more difficult for humans, especially those with accessibility needs, then what even is the point of publishing anything?

kjhcvkek77

10 months ago

[-]

There's a saying, a bird in the hand is worth two in the bush.

An author might reasonably prefer 90% of people visit his site to 100% of people consuming the content indirectly.

latexr

10 months ago

[-]

I don’t think those numbers are even close to realistic. It’s absurd to think that having an accessible website will make everyone consume it via LLM, or that having an inaccessible website (which, by the way, will only stave off scraping temporarily) will make most people visit it. We can’t have a productive conversation by making up numbers.

10 months ago

[-]

Wrapping the website in javascript also won't stop ML crawlers as they probably all already use headless chromium to deal with the modern web.

[1] https://js1024.fun/demos/2022/18/readme

10 months ago

[-]

It is actually possible to use Brotli directly in the web browser... with caveats of course. I believe my 2022 submission to JS1024 [1] is the first ever demonstration of this concept, and I also have a proof-of-concept code for the arbitrary compression (which sadly didn't work for the original size-coding purpose though). The main caveat is that you are effectively limited to the ASCII character, and that it is highly sensitive to the rendering stack for the obvious reason---it no longer seems to function in Firefox right now.

[2] https://gist.github.com/lifthrasiir/1c7f9c5a421ad39c1af19a9c...

10 months ago

[-]

The key note for understanding the approach (without diving into how to understand it's actually wrangled):

> The only possibility is to use the WOFF2 font file format which Brotli was originally designed for, but you need to make a whole font file to leverage this. This got more complicated recently by the fact that modern browsers sanitize font files, typically by the OpenType Sanitizer (OTS), as it is very insecure to put untrusted font files directly to the system. Therefore we need to make an WOFF2 file that is sane enough to be accepted by OTS _and_ has a desired byte sequence inside which can be somehow extracted. After lots of failed experiments, I settled on the glyph widths ("advance") which get encoded in a sequence of two-byte signed integers with almost no other restrictions.

Fantastic idea!

10 months ago

[-]

Correction: It still works on Firefox, I just forgot that its zoom factor should be exactly 100% on Firefox to function. :-)

10 months ago

[-]

This technique is amazing and way cooler than my post. Kudos!

raggi

10 months ago

[-]

Chromies got in the way of it for a very long time, but zstd is now coming to the web too, as it’s finally landed in chrome - now we’ve gotta get safari onboard

mananaysiempre

10 months ago

[-]

I’d love to go all-Zstandard, but in this particular case, as far as I know, Brotli and Zstandard are basically on par at identical values of decompressor memory consumption.

raggi

10 months ago

[-]

Brotli and zstd are close, each trading in various configurations and cases - but zstd is muuuuuch more usable in a practical sense, because the code base can be easily compiled and linked against in a wider number of places with less effort, and the cli tools turn up all over for similar reasons. Brotli like many Google libraries in the last decade are infected with googlisms. Brotli is less bad, in a way, for example the C comes with zero build system at all, which is marginally better than a bazel disaster of generated stuff, but it’s still not in a state the average distro contributor gives a care to go turn into a library, prepare pkgconf rules for and whatnot - plus no versioning and so on. Oh and the tests are currently failing.

simondotau

10 months ago

[-]

Realistically, everything should just support everything. There’s no reason why every (full featured) web server and every (full featured) web browser couldn’t support all compelling data compression algorithms.

Unfortunately we live in a world where Google decides to rip JPEG-XL support out of Chrome for seemingly no reason other than spite. If the reason was a lack of maturity in the underlying library, fine, but that wasn’t the reason they offered.

madeofpalk

10 months ago

[-]

> There’s no reason

Of course, there is - and it's really boring. Prioritisation, and maintenance.

It's a big pain to add, say, 100 compressions formats and support them indefinitely, especially with little differentiation between them. Once we agree on what the upper bound of useless formats is, we can start to negotiate what the lower limit is.

raggi

10 months ago

[-]

It’s not prioritization of the code - it’s relatively little implementation and low maintenance but security is critical here and everyone is very afraid of compressors because gzip, jpeg and various others were pretty bad. Zstd, unlike lz4 before it (at least early on) has a good amount of tests and fuzzing. The implementation could probably be safer (with fairly substantial effort) but having test and fuzzing coverage is a huge step forward over prior industry norms

simondotau

10 months ago

[-]

I qualified it with compelling which means only including formats/encodings which have demonstrably superior performance in some non-trivial respect.

And I qualified it with mature implementation because I agree that if there is no implementation which has a clear specification, is well written, actively maintained, and free of jank, then it ought not qualify.

Relative to the current status quo, I would only imagine the number of data compression, image compression, and media compression options to increase by a handful. Single digits. But the sooner we add them, the sooner they can become sufficiently widely deployed as to be useful.

rafaelmn

10 months ago

[-]

How many CVEs came out of different file format handling across platforms ? Including shit in browsers has insane impact.

CharlesW

10 months ago

[-]

Looks like it's on the To Do list, at least: https://webkit.org/standards-positions/#position-168

jfoster

10 months ago

[-]

I work on Batch Compress (https://batchcompress.com/en) and recently added WebP support, then made it the default soon after.

As far as I know, it was already making the smallest JPEGs out of any of the web compression tools, but WebP was coming out only ~50% of the size of the JPEGs. It was an easy decision to make WebP the default not too long after adding support for it.

Quite a lot of people use the site, so I was anticipating some complaints after making WebP the default, but it's been about a month and so far there has been only one complaint/enquiry about WebP. It seems that almost all tools & browsers now support WebP. I've only encountered one website recently where uploading a WebP image wasn't handled correctly and blocked the next step. Almost everything supports it well these days.

pornel

10 months ago

[-]

Whenever WebP gives you file size savings bigger than 15%-20% compared to a JPEG, the savings are coming from quality degradation, not from improved compression. If you compress and optimize JPEG well, it shouldn't be far behind WebP.

You can always reduce file size of a JPEG by making a WebP that looks almost the same, but you can also do that by recompressing a JPEG to a JPEG that looks almost the same. That's just a property of all lossy codecs, and the fact that file size grows exponentially with quality, so people are always surprised how even tiny almost invisible quality degradation can change the file sizes substantially.

10 months ago

[-]

> but WebP was coming out only ~50% of the size of the JPEGs

Based on which quality comparison metric? WebP has a history of atrocious defaults that murder detail in dark areas.

jfoster

10 months ago

[-]

Nothing technically objective, just the size that a typical photo can be reduced to without looking bad.

It really depends on what you're after, right? If preserving every detail matters to you, lossless is what you want. That's not going to create a good web experience for most users, though.

98469056

10 months ago

[-]

While peeking at the source, I noticed that the doctype declaration is missing a space. It currently reads <!doctypehtml>, but it should be <!doctype html>

See https://GitHub.com/kangax/html-minifier/pull/970 / https://HTML.spec.WHATWG.org/multipage/parsing.html#parse-er...

palsecam

10 months ago

[-]

`<!doctype html>` can be minified into `<!doctypehtml>`.

It’s, strictly speaking, invalid HTML, but it still successfully triggers standards mode.

(I too use that trick on https://FreeSolitaire.win)

saagarjha

10 months ago

[-]

Why would you do this?

palsecam

10 months ago

[-]

To have FreeSolitaire.win homepage be only 20.7 kB over the wire.

That’s for the whole game: graphics are inline SVGs, JS & CSS are embedded in <script> and <style> elements.

10 months ago

[-]

To be clear, you can minimize much further than what you already have. For example you haven't fully minified SVG, which can be also replaced with just symbol characters. Saving a single space from `<!doctype html>` is insignificant compared to that.

novoreorx

10 months ago

[-]

This would be another interesting story about web optimization if you wrote a post about it! :)

saagarjha

10 months ago

[-]

Right but surely skipping that space wasn't that much of a win

0: https://github.com/KTibow/KTibow/issues/3#issuecomment-23367...

KTibow

10 months ago

[-]

Sounds like this is from a minifier that removes as much as it can [0]

10 months ago

[-]

Maybe their javascript adds it back in :)

10 months ago

[-]

I've used this trick before! Oddly enough I can't remember what I used it for (perhaps just to see if I could), and I commented on it here: https://gist.github.com/gasman/2560551?permalink_comment_id=...

Edit: I found my prototype from way back, I guess I was just testing heh: https://retr0.id/stuff/bee_movie.webp.html

lucb1e

10 months ago

[-]

That page breaks my mouse gestures add-on! (Or, I guess we don't have add-ons anymore but rather something like script injections that we call extensions, yay...) Interesting approach to first deliver 'garbage' and then append a bit of JS to transform it back into a page. The inner security nerd in me wonders if this might open up attacks if you would have some kind of user-supplied data, such as a comment form. One could probably find a sequence of bytes to comment that will, after compression, turn into a script tag, positioned (running) before yours would?

10 months ago

[-]

Yeah that's plausible, you definitely don't want any kind of untrusted data in the input.

Something I wanted to do but clearly never got around to, was figuring out how to put an open-comment sequence (<!--) in a header somewhere, so that most of the garbage gets commented out

10 months ago

[-]

In my experience WebP didn't work well for the general case where this technique is actually useful (i.e. data less than 10 KB), because the most additions of WebP lossless to PNG are about modelling and not encoding, while this textual compression would only use the encoding part of WebP.

butz

10 months ago

[-]

Dropping google fonts should improve page load time a bit too, considering those are loaded from remote server that requires additional handshake.

kevindamm

10 months ago

[-]

..but if enough other sites are also using that font then it may already be available locally.

pornel

10 months ago

[-]

This has stopped working many years ago. Every top-level domain now has its own private cache of all other domains.

You likely have dozens of copies of Google Fonts, each in a separate silo, with absolutely zero reuse between websites.

This is because a global cache use to work like a cookie, and has been used for tracking.

madeofpalk

10 months ago

[-]

Where "many years ago" is... 11 years ago for Safari! https://bugs.webkit.org/show_bug.cgi?id=110269

kevindamm

10 months ago

[-]

ah, I had forgotten about that, you're right.

well at least you don't have to download it more than once for the site, but first impressions matter yeah

https://developer.chrome.com/blog/http-cache-partitioning

SushiHippie

10 months ago

[-]

No, because of cache partitioning, which has been a thing for a while.

niutech

10 months ago

[-]

This page is broken at least on Sailfish OS browser, there is a long empty space after the paragraph:

> Alright, so we’re dealing with 92 KiB for gzip vs 37 + 71 KiB for Brotli. Umm…

That said, the overhead of gzip vs brotli HTML compression is nothing compared with amount of JS/images/video current websites use.

mediumsmart

10 months ago

[-]

same on orion and safari and librewolf - is this a chrome page?

simmonmt

10 months ago

[-]

A different comment says librewolf disables webgl by default, breaking OP's decompression. Is that what you're seeing?

10 months ago

[-]

Works fine on Safari and Firefox for me.

txtsd

10 months ago

[-]

Same on Mull

michaelbrave

10 months ago

[-]

I personally don't much care for the format, if I save an image and it ends up WebP then I have to convert it before I can edit or use it in any meaningful way since it's not supported in anything other than web browsers. It's just giving me extra steps to have to do.

hot_gril

10 months ago

[-]

Ironically, even Google's products like Slides don't support webp images. But if/when it gets more support, I guess it's fine. I can tolerate a new format once every 20 years.

.webm can go away, though.

10 months ago

[-]

WebP is already deprecated from a compression ratio standpoint - it's entirely surpassed by all of AVIF, HEIF, and JXL. Too bad that none of these has broad support (and the best of them, JXL, is being blocked by all browser makers).

WebM on the other hand still has a reason to exist unfortunately: patents on H.264.

qingcharles

10 months ago

[-]

JXL is supported in Safari, and Mozilla just put out a post saying they'll add support if someone will write a Rust decoder, and (bizarrely) Google has agreed to supply one.

So there is a small chance Google will reverse the removal from Chrome.

ezfe

10 months ago

[-]

Takes 2 seconds (literally, on macOS it's in the right click menu) to convert and it's smaller so not really a problem

10 months ago

[-]

Conversion means additional quality loss unless you are converting to a lossy format. Better solution would be to use reasonable tools that are not stuck in the stone age and can open WebP directly.

[1] https://github.com/gildas-lormeau/SingleFile?tab=readme-ov-f...

gildas

10 months ago

[-]

In the same vein, you can package HTML pages as self-extracting ZIP files with SingleFile [1]. You can even include a PNG image to produce files compatible with HTML, ZIP and PNG [2], and for example display the PNG image in the HTML page [3].

[2] https://github.com/gildas-lormeau/Polyglot-HTML-ZIP-PNG

[3] https://github.com/gildas-lormeau/Polyglot-HTML-ZIP-PNG/raw/...

divbzero

10 months ago

[-]

> Typically, Brotli is better than gzip, and gzip is better than nothing. gzip is so cheap everyone enables it by default, but Brotli is way slower.

Note that way slower applies to speed of compression, not decompression. So Brotli is a good bet if you can precompress.

> Annoyingly, I host my blog on GitHub pages, which doesn’t support Brotli.

If your users all use modern browsers and you host static pages through a service like Cloudflare or CloudFront that supports custom HTTP headers, you can implement your own Brotli support by precompressing the static files with Brotli and adding a Content-Encoding: br HTTP header. This is kind of cheating because you are ignoring proper content negotiation with Accept-Encoding, but I’ve done it successfully for sites with targeted user bases.

phh

10 months ago

[-]

> A real-world web page compressed with WebP? Oh, how about the one you’re reading right now? Unless you use an old browser or have JavaScript turned off, WebP compresses this page starting from the “Fool me twice” section. If you haven’t noticed this, I’m happy the trick is working :-)

Well it didn't work in Materialistic (I guess their webview disable js), and the failure mode is really not comfortable.

[1] https://en.wikipedia.org/wiki/Sloot_Digital_Coding_System

next_xibalba

10 months ago

[-]

If only we hadn't lost Jan Sloot's Digital Coding System [1], we'd be able to transmit GB in milliseconds across the web!

10 months ago

[-]

> The Sloot Digital Coding System is an alleged data sharing technique that its inventor claimed could store a complete digital movie file in 8 kilobytes of data

8 kilobytes? Rookie numbers. I'll do it in 256 bytes, as long as you're fine with a somewhat limited selection of available digital movie files ;)

MBCook

10 months ago

[-]

I can shrink any file down to just 32 bits using my unique method.

I call it the High Amplitude Shrinkage Heuristic, or H.A.S.H.

It is also reversible, but only safely to the last encoded file due to quantum hyperspace entanglement of ionic bonds. H.A.S.H.ing a different file will disrupt them preventing recovery of the original data.

supriyo-biswas

10 months ago

[-]

This claim itself is probably a hoax and not relevant to the article at hand; but these days with text-to-image models and browser support, you could probably do something like <img prompt="..."> and have the browser render something that matches the description, similar to the "cookbook" analogy used in the Wikipedia article.

Lorin

10 months ago

[-]

That's an interesting concept, although it would generate a ton of bogomips since each client has to generate the image themselves instead of one time on the server.

You'd also want "seed" and "engine" attributes to ensure all visitors see the same result.

onion2k

10 months ago

[-]

Unless you don't actually care if everyone sees the same results. So long as the generated image is approximately what you prompted for, and the content of the image is decorative so it doesn't really need to be a specific, accurate representation of something, it's fine to display a different picture for every user.

One of the best uses of responsive design I've ever seen was a site that looked completely different at different breakpoints - different theme, font, images, and content. It's was beautiful, and creative, and fun. Lots of users saw different things and had no idea other versions were there.

semolino

10 months ago

[-]

What site are you referring to?

LeoPanthera

10 months ago

[-]

You could at least push the work closer to the edge, by having genAI servers on each LAN, and in each ISP, similar to the idea of a caching web proxy before HTTPS rendered them impossible.

lucianbr

10 months ago

[-]

Push the work closer to the edge, and multiply it by quite a lot. Generate each image many times. Why would we want this? Seems like the opposite of caching in a sense.

sroussey

10 months ago

[-]

If you are reading a web page on mars and bandwidth is more precious than processing power, then <img prompt=“…”> might make sense.

Not so much for us on earth however.

roywiggins

10 months ago

[-]

This sort of thing (but applied to video) is a plot point in A Fire Upon The Deep. Vinge's term for the result is an "evocation."

bobbylarrybobby

10 months ago

[-]

All compression is, in a sense, the opposite of caching. You have to do more work to get the data, but you save space.

7bit

10 months ago

[-]

That would require a lot of GBs in libraries in the browser and a lot of processing power on the client CPU to render an image that is so unimportant that it doesn't really matter if it shows exactly what the author intended. To summarize that in three words: a useless technology.

That idea is something that is only cool in theory.

10 months ago

[-]

At least for LLMs, something very similar is already happening: https://huggingface.co/blog/Xenova/run-gemini-nano-in-your-b...

Currently, we're definitely not there in terms of space/time tradeoffs for images, but I could imagine at least parameterized ML-based upscaling (i.e. ship a low-resolution image and possibly a textual description, have a local model upscale it to display resolution) at some point.

ec109685

10 months ago

[-]

Similar to what Samsung does if you take a picture of the moon.

everforward

10 months ago

[-]

Similar ideas have floated around for a while. I’ve always enjoyed the elegance of compressing things down to a start and end index of digits of Pi.

It’s utterly impractical, but fun to muse about how neat it would be if it weren’t.

LtWorf

10 months ago

[-]

AFAIK it's not been proved that every combination does exist in π.

By comparison, you could easily define a number that goes 0,123456789101112131415… and use indexes to that number. However the index would probably be larger than what you're trying to encode.

everforward

10 months ago

[-]

Huh, I presumed that any non-repeating irrational number would include all number sequences but I think you’re right. Even a sequence of 1 and 0 could be non-repetitive.

I am curious what the compression ratios would be. I suspect the opposite, but the numbers are at a scale where my mind falters so I wouldn’t say that with any confidence. Just 64 bits can get you roughly 10^20 digits into the number, and the “reach” grows exponentially with bits. I would expect that the smaller the file, the more common its sequence is.

LtWorf

10 months ago

[-]

Well with my number you can at least calculate the digits at any position without needing to calculate all the digits before.

Let's do it for a similar number but in binary format… a 1mb file has 2²⁰ digits. If we optimize the indexing to point to the "number" instead of the "digit"… so that the index is smaller. Magically the index is as long as the file!

https://m.youtube.com/watch?v=TQy3EU8BCmo

Spivak

10 months ago

[-]

The real version of this is Nvidia's web conferencing demo where they make a 3d model of your face and then only transfer the wireframe movements which is super clever.

You can really feel the "compute has massively outpaced networking speed" where this kind of thing is actually practical. Maybe I'll see 10G residential in my lifetime.

kstrauser

10 months ago

[-]

This messages comes to you via a 10Gbps, $50 residential fiber connection.

The future is already here – it's just not very evenly distributed.

magicalhippo

10 months ago

[-]

Reminds me of when I was like 13 and learned about CRC codes for the first time. Infinite compression here we come! Just calculate the 32bit CRC code for say 64 bits, transmit the CRC, then on the other hand just loop over all possible 64 bit numbers until you got the same CRC. So brilliant! Why wasn't this already used?!

Of course, the downsides became apparent once the euphoria had faded.

10 months ago

[-]

Very relatable! "If this MD5 hash uniquely identifies that entire movie, why would anyone need to ever send the actual... Oh, I get it."

Arguably the cruelest implication of the pigeonhole principle.

_factor

10 months ago

[-]

Even better, take a physical object and slice it precisely in a ratio that contains your data in the fraction!

Groxx

10 months ago

[-]

Time Lords probably, saving us from the inevitable end of this technology path, where all data in the universe is compressed into one bit which leads to an information-theoretic black hole that destroys everything.

rrrix1

10 months ago

[-]

I very much enjoyed reading this. Quite clever!

But...

> Annoyingly, I host my blog on GitHub pages, which doesn’t support Brotli.

Is the glaringly obvious solution to this not as obvious as I think it is?

TFA went through a lot of round-about work to get (some) Brotli compression. Very impressive Yak Shave!

If you're married to the idea of a Git-based automatically published web site, you could at least replicate your code and site to Gitlab Pages, which has supported precompressed Brotli since 2019. Or use one of Cloudflare's free tier services. There's a variety of ways to solve this problem before the first byte is sent to the client.

Far too much of the world's source code already depends exclusively on Github. I find it distasteful to also have the small web do the same while blindly accepting an inferior experience and worse technology.

10 months ago

[-]

The solution's obvious, but if I followed it, I wouldn't have a fun topic to discuss or fake internet points to boast about, huh? :)

I'll probably switch to Cloudflare Pages someday when I have time to do that.

somishere

10 months ago

[-]

Lots of nice tricks in here, definitely fun! Only minor nitpick is that it departs fairly rapidly from the lede ... which espouses the dual virtues of an accessible and js-optional reading experience ;)

10 months ago

[-]

From the linked Github issue giving the rationale why Brotli is not available in the CompressionStream API:

> As far as I know, browsers are only shipping the decompression dictionary. Brotli has a separate dictionary needed for compression, which would significantly increase the size of the browser.

How can the decompression dictionary be smaller than the compression one? Does the latter contain something like a space-time tradeoff in the form of precalculated most efficient representations of given input substrings or something similar?

[1] https://github.com/google/brotli/blob/master/c/enc/dictionar...

10 months ago

[-]

I believe the compression dictionary refers to [1], which is used to quickly match dictionary-compressable byte sequences. I don't know where 170 KB comes from, but that hash alone does take 128 KiB and might be significant if it can't be easily recomputed. But I'm sure that it can be quickly computed on the loading time if the binary size is that important.

10 months ago

[-]

I was wondering that too, but that dictionary itself compresses down to 29 KB (using regular old gzip), so it seems pretty lightweight to include even if it were hard/annoying to precompute at runtime or install time.

10 months ago

[-]

Once installed it will occupy 128 KiB of data though, so it might be still relevant for the Chromium team.

10 months ago

[-]

Perhaps I'm reading past some of the surrounding context but that doesn't actually say the problem is about the relative sizes, just that the compression dictionary isn't already in browsers while the decompression dictionary already is.

It's a bit disappointing you can't use Brotli in the DecompressionStream() interface just because it may or may not be available in the CompressionStream() interface though.

10 months ago

[-]

I'm actually not convinced that there are two different dictionaries at all. The Brotli RFC only talks about a static dictionary, not a separate encoding vs. a decoding dictionary.

My suspicion is that this is a confusion of the (runtime) sliding window, which limits maximum required memory on the decoder's side to 16 MB, with the actual shared static dictionary (which needs to be present in the decoder only, as far as I can tell; the encoder can use it, and if it does, it would be the same one the decoder has as well).

10 months ago

[-]

IIRC from working with Brotli before it's not that the it's truly a "different dictionary" but rather more like a "reverse view" of what is ultimately the same dictionary mappings.

On one hand it seems a bit silly to worry about ~100 KB in browser for what will probably, on average, save more than that in upload/download the first time it is used. On the other hand "it's just a few hundred KB" each release for a few hundred releases ends up being a lot of cruft you can't remove without breaking old stuff. On the third hand coming out of our head... it's not like Chrome has been against shipping more for functionality for features they'd like to impose on users even if users don't actually want them anyways so what are small ones users can actually benefit from against that.

Dibby053

10 months ago

[-]

I didn't know canvas anti-fingerprinting was so rudimentary. I don't think it increases uniqueness (the noise is different every run) but bypassing it seems trivial: run the thing n times and take the mode. With so little noise, 4 or 5 times should be more than enough.

hedora

10 months ago

[-]

The article says it’s predictable within a given client, so your trick wouldn’t work.

So, just use the anti-fingerprint noise as a cookie, I guess?

Dibby053

10 months ago

[-]

Huh, it seems it's just my browser that resets the noise every run.

I opened the page in Firefox like the article suggests and I get a different pattern per site and session. That prevents using the noise as a supercookie, I think, if its pattern changes every time cookies are deleted.

sunaookami

10 months ago

[-]

It is different per site and per session, yes.

Pesthuf

10 months ago

[-]

It's impressive how close this is to Brotli even though brotli has this massive pre-shared dictionary. Is the actual compression algorithm used by it just worse, or does the dictionary just matter much less than I think?

10 months ago

[-]

Pre-shared dictionary is most effective for the small size that can't reach the stationary distribution required for typical compressors. I don't know the exact threshold, but my best guess is around 1--10 KB.

TacticalCoder

10 months ago

[-]

I loved that (encoding stuff in webp) but my takeaway from the figures in the article is this: brotli is so good I'll host from somewhere where I can serve brotli (when and if the client supports brotli ofc).

[1] https://en.m.wikipedia.org/wiki/Error_correction_code

kardos

10 months ago

[-]

On the fingerprinting noise: this sounds like a job for FEC [1]. It would increase the size but allow using the Canvas API. I don't know if this would solve the flicker though (not a front end expert here)

Also, it's a long shot, but could the combo of FEC (+size) and lossy compression (-size) be a net win?

astrostl

10 months ago

[-]

Things I seek in an image format:

(1) compatibility

(2) features

WebP still seems far behind on (1) to me so I don't care about the rest. I hope it gets there, though, because folks like this seem pretty enthusiastic about (2).

jfoster

10 months ago

[-]

The compatibility gap on WebP is already quite small. Every significant web browser now supports it. Almost all image tools & viewers do as well.

Lossy WebP comes out a lot smaller than JPEG. It's definitely worth taking the saving.

mistrial9

10 months ago

[-]

agree - webp has a lib on Linux but somehow, standard image viewers just do not read it, so -> "FAIL"

sgbeal

10 months ago

[-]

> webp has a lib on Linux but somehow, standard image viewers just do not read it

That may apply to old "LTS" Linuxes, but not any relatively recent one. Xviewer and gimp immediately come to mind as supporting it and i haven't had a graphics viewer on Linux _not_ be able to view webp in at least 3 or 4 years.

bogzz

10 months ago

[-]

Still waiting on a webp encoder to be added to the Go stdlib...

ajsnigrutin

10 months ago

[-]

So... 60kB less of transfer... and how much slower is it on an eg. galaxy s8 phone my mom has, because of all the shenanigans done to save those 60kB?

butz

10 months ago

[-]

I wonder what is the difference in CPU usage on client side for WebP variant vs standard HTML? Are you causing more battery drain on visitor devices?

lucb1e

10 months ago

[-]

It depends. Quite often (this is how you can tell I live in Germany) mobile data switches to "you're at the EDGE of the network range" mode¹ and transferring a few KB means keeping the screen on and radio active for a couple of minutes. If the page is now 45KB instead of 95KB, that's a significant reduction in battery drain!

Under normal circumstances you're probably very right

¹ Now I wonder if the makers foresaw how their protocol name might sound to us now

csjh

10 months ago

[-]

I think the most surprising part here is the gzipped-base64'd-compressed data almost entirely removes the base64 overhead.

10 months ago

[-]

It feels somewhat intuitive since (as the article notes) the Huffman encoding stage effectively "reverses" the original base64 overhead issue that an 8 bit (256 choices) index is used for 6 bits (64 choices) of actual characters. A useful compression algorithm which _didn't_ do this sort of thing would be very surprising as it would mean it doesn't notice simple patterns in the data but somehow compresses things anyways.

ranger_danger

10 months ago

[-]

how does it affect error correction though?

10 months ago

[-]

Neither base64 nor the gzipped version have error correction as implemented. The extra overhead bits in base64 come from selecting only a subset of printable characters, not by adding redundancy to the useful bits.

ranger_danger

10 months ago

[-]

>ensure it works without JavaScript enabled

>manually decompress it in JavaScript

>Brotli decompressor in WASM

the irony seems lost

10 months ago

[-]

A nojs version is compiled separately and is linked via a meta refresh. Not ideal, and I could add some safeguards for people without webgl, but I'm not an idiot.

Jamie9912

10 months ago

[-]

Why don't they make zstd images surely that would beat webp

sgbeal

10 months ago

[-]

> Why don't they make zstd images surely that would beat webp

zstd is a general-purpose compressor. By and large (and i'm unaware of any exceptions), specialized/format-specific compression (like png, wepb, etc.) will compress better than a general-purpose compressor because format-specific compressors can take advantage of quirks of the format which a general-purpose solution cannot. Also, format-specific ones are often lossy (or conditionally so), enabling them to trade lower fidelity for better compression, something a general-purpose compressor cannot do.

galaxyLogic

10 months ago

[-]

Is there a tool or some other way to easily encode a JPG image so it can be embedded into HTML? I know there is something like that, but is it easy? Could it be made easier?

throwanem

10 months ago

[-]

You can convert it to base64 and inline it anywhere an image URL is accepted, eg

    <img src="data:image/jpeg;base64,abc123..." />

(Double-check the exact syntax and the MIME type before you use it; it's been a few years since I have, and this example is from perhaps imperfect memory.)

galaxyLogic

9 months ago

[-]

Thanks

DaleCurtis

10 months ago

[-]

What a fun excursion :) You can also use the ImageDecoder API: https://developer.mozilla.org/en-US/docs/Web/API/ImageDecode... and VideoFrame.copyTo: https://developer.mozilla.org/en-US/docs/Web/API/VideoFrame/... to skip canvas entirely.

10 months ago

[-]

It's unfortunately Chromium-only for now, and I wanted to keep code simple. I've got a PoC lying around with VideoFrame and whatnot, but I thought this would be better for a post.

toddmorey

10 months ago

[-]

"I hope you see where I’m going with this and are yelling 'Oh why the fuck' right now."

I love reading blogpost like these.

cobbal

10 months ago

[-]

I would love to try reading the lossy version.

bawolff

10 months ago

[-]

They did all this and didn't even measure time to first paint?

What is the point of doing this sort of thing if you dont even test how much faster or slower it made the page to load?

kopirgan

10 months ago

[-]

19 year old and look at the list of stuff she's done! Perhaps started coding in the womb?! Amazing.