Huge Binaries
181 points
15 hours ago
| 13 comments
| fzakaria.com
| HN
yjftsjthsd-h
13 hours ago
[-]
> I had observed binaries beyond 25GiB, including debug symbols. How is this possible? These companies prefer to statically build their services to speed up startup and simplify deployment. Statically including all code in some of the world’s largest codebases is a recipe for massive binaries.

I am very sympathetic to wanting nice static binaries that can be shipped around as a single artifact[0], but... surely at some point we have to ask if it's worth it? If nothing else, that feels like a little bit of a code smell; surely if your actual executable code doesn't even fit in 2GB it's time to ask if that's really one binary's worth of code or if you're actually staring at like... a dozen applications that deserve to be separate? Or get over it the other way and accept that sometimes the single artifact you ship is a tarball / OCI image / EROFS image for systemd[1] to mount+run / self-extracting archive[2] / ...

[0] Seriously, one of my background projects right now is trying to figure out if it's really that hard to make fat ELF binaries.

[1] https://systemd.io/PORTABLE_SERVICES/

[2] https://justine.lol/ape.html > "PKZIP Executables Make Pretty Good Containers"

reply
jmmv
12 hours ago
[-]
This is something that always bothered me while I was working at Google too: we had an amazing compute and storage infrastructure that kept getting crazier and crazier over the years (in terms of performance, scalability and redundancy) but everything in operations felt slow because of the massive size of binaries. Running a command line binary? Slow. Building a binary for deployment? Slow. Deploying a binary? Slow.

The answer to an ever-increasing size of binaries was always "let's make the infrastructure scale up!" instead of "let's... not do this crazy thing maybe?". By the time I left, there were some new initiatives towards the latter and the feeling that "maybe we should have put limits much earlier" but retrofitting limits into the existing bloat was going to be exceedingly difficult.

reply
darubedarob
4 hours ago
[-]
I think google of all companies could build a good autostripper reducing binaries by adding partial load assembly on misses. It cant be much slower then shovelling a full monorepo assembly plus symbols into ram.
reply
loeg
4 hours ago
[-]
The low-hanging fruit is just not shipping the debuginfo, of course.
reply
usefulcat
59 minutes ago
[-]
Is compressed debug info a thing? It seems likely to compress well, and if it's rarely used then it might be a worthwhile thing to do?
reply
loeg
15 minutes ago
[-]
It is: https://maskray.me/blog/2022-01-23-compressed-debug-sections

But the compression ratio isn't magical (approx. 1:0.25, for both zlib and zstd in the examples given). You'd probably still want to set aside debuginfo in separate files.

reply
lenkite
5 hours ago
[-]
Maybe I am missing something, but why didn't they just leverage dynamic libraries ?
reply
btilly
3 hours ago
[-]
When I was at Google, on an SRE team, here is the explanation that I was given.

Early on Google used dynamic libraries. But weird things happen at Google scale. For example Google has a dataset known, for fairly obvious reasons, as "the web". Basically any interesting computation with it takes years. Enough to be a multiple of the expected lifespan of a random computer. Therefore during that computation, you have to expect every random thing that tends to go wrong, to go wrong. Up to and including machines dying.

One of the weird things that becomes common at Google scale, are cosmic bit flips. With static binaries, you can figure out that something went wrong, kill the instance, launch a new one, and you're fine. That machine will later launch something else and also be fine.

But what happens if there was a cosmic bit flip in a dynamic library? Everything launched on that machine will be wrong. This has to get detected, then the processes killed and relaunched. Since this keeps happening, that machine is always there lightly loaded, ready for new stuff to launch. New stuff that...wind up broken for the same reason! Often the killed process will relaunch on the bad machine, failing again! This will continue until someone reboots the machine.

Static binaries are wasteful. But they aren't as problematic for the infrastructure as detecting and fixing this particular condition. And, according to SRE lore circa 2010, this was the actual reason for the switch to static binaries. And then they realized all sorts of other benefits. Like having a good upgrade path for what would normally be shared libraries.

reply
ambrosio
1 hour ago
[-]
> But what happens if there was a cosmic bit flip in a dynamic library?

I think there were more basic reasons we didn't ship shared libraries to production.

1. They wouldn't have been "shared", because every program was built from its own snapshot of the monorepo, and would naturally have slightly different library versions. Nobody worried about ABI compatibility when evolving C++ interfaces, so (in general) it wasn't possible to reuse a .so built at another time. Thus, it wouldn't actually save any disk space or memory to use dynamic linking.

2. When I arrived in 2005, the build system was embedding absolute paths to shared libraries into the final executable. So it wasn't possible to take a dynamically linked program, copy it to a different machine, and execute it there, unless you used a chroot or container. (And at that time we didn't even use mount namespaces on prod machines.) This was one of the things we had to fix to make it possible to run tests on Forge.

3. We did use shared libraries for tests, and this revealed that ld.so's algorithm for symbol resolution was quadratic in the number of shared objects. Andrew Chatham fixed some of this (https://sourceware.org/legacy-ml/libc-alpha/2006-01/msg00018...), and I got the rest of it eventually; but there was a time before GRTE, when we didn't have a straightforward way to patch the glibc in prod.

That said, I did hear a similar story from an SRE about fear of bitflips being the reason they wouldn't put the gws command line into a flagfile. So I can imagine it being a rationale for not even trying to fix the above problems in order to enable dynamic linking.

> Since this keeps happening, that machine is always there lightly loaded, ready for new stuff to launch. New stuff that...wind up broken for the same reason!

I did see this failure mode occur for similar reasons, such as corruption of the symlinks in /lib. (google3 executables were typically not totally static, but still linked libc itself dynamically.) But it always seemed to me that we had way more problems attributable to kernel, firmware, and CPU bugs than to SEUs.

reply
btilly
24 minutes ago
[-]
Thanks. It is nice to hear another perspective on this.

But here is a question. How much of SEUs not being problems were because they weren't problems? Versus because there were solutions in place to mitigate the potential severity of that kind of problem? (The other problems that you name are harder to mitigate.)

reply
dh2022
3 hours ago
[-]
In Azure - which I think is at Google scale - everything is dynamically linked. Actually a lot of Azure is built on C# which does not even support static linking...

Statically linking being necessary for scaling does not pass the smell test for me.

reply
btilly
39 minutes ago
[-]
Azure's devops record is not nearly as good as Google's was.

The biggest datasets that ChatGPT is aware of being processed in complex analytics jobs on Azure are roughly a thousand times smaller than an estimate of Google's regularly processed snapshot of the web. There is a reason why most of the fundamental advancements in how to parallelize data and computations - such as map-reduce and BigTable - all came from Google. Nobody else worked at their scale before they did. (Then Google published it, and people began to implement it. Then failed to understand what was operationally important to making it actually work at scale...)

So, despite how big it is, I don't think that Azure operates at Google scale.

For the record, back when I worked at Google, the public internet was only the third largest network that I knew of. Larger still was the network that Google uses for internal API calls. (Do you have any idea how many API calls it takes to serve a Google search page?) And larger still was the network that kept data synchronized between data centers. (So, for example, you don't lose your mail if a data center goes down.)

reply
mbreese
1 hour ago
[-]
I never worked for Google, but have seen some strange things like bit flips at more modest scales. From the parent description, it looks like defaulting to static binaries is helping to speed up troubleshooting to remove the “this should never happen, but statistically will happen every so often” class of bugs.

As I see it, the issue isn’t requiring static compiling to scale. It’s requiring it to make troubleshooting or measuring performance at scale easier. Not required, per se, but very helpful.

reply
btilly
36 minutes ago
[-]
Exactly. SRE is about monitoring and troubleshooting at scale.

Google runs on a microservices architecture. It's done that since before that was cool. You have to do a lot to make a microservices architecture work. Google did not advertise a lot of that. Today we have things like Data Dog that give you some of the basics. But for a long time, people who left Google faced a world of pain because of how far behind the rest of the world was.

reply
arccy
2 hours ago
[-]
perhaps that's why azure has such a bad reputation in the devops crowd.
reply
dh2022
14 minutes ago
[-]
Does AWS have a good reputation in devops? Because large chunks of AWS are built on Java - which also does not offer static linking (bundling a bunch of *.jar files into one exe does not count as static linking). Still does not pass the smell test.
reply
tmoertel
5 hours ago
[-]
One reason is that using static binaries greatly simplifies the problem of establishing Binary Provenance, upon which security claims and many other important things rely. In environments like Google’s it's important to know that what you have deployed to production is exactly what you think it is.

See for more: https://google.github.io/building-secure-and-reliable-system...

reply
joatmon-snoo
9 hours ago
[-]
There's a lot of tooling built on static binaries:

- google-wide profiling: the core C++ team can collect data on how much of fleet CPU % is spent in absl::flat_hash_map re-bucketing (you can find papers on this publicly)

- crashdump telemetry

- dapper stack trace -> codesearch

Borg literally had to pin the bash version because letting the bash version float caused bugs. I can't imagine how much harder debugging L7 proxy issues would be if I had to follow a .so rabbit hole.

I can believe shrinking binary size would solve a lot of problems, and I can imagine ways to solve the .so versioning problem, but for every problem you mention I can name multiple other probable causes (eg was startup time really execvp time, or was it networked deps like FFs).

reply
Filligree
7 hours ago
[-]
There’s no way my proxy binary actually requires 25GB of code, or even the 3GB it is. Sounds to me like the answer is a tree shaker.
reply
Sesse__
7 hours ago
[-]
Google implemented the C++ equivalent of a tree shaker in their build system around 2009.
reply
setheron
5 hours ago
[-]
the front-end services to be "fast" AFAIK probably include nearly all the services you need to avoid hops -- so you can't really shake that much away.
reply
bfrog
4 hours ago
[-]
Sounds like Google could really use Nix
reply
shevy-java
8 hours ago
[-]
reply
yjftsjthsd-h
4 hours ago
[-]
Portable across systemd/Linux systems, yes:)
reply
jcelerier
7 hours ago
[-]
What's wild to me is not using -gsplit-dwarf to have separate debug info and "normal-sized" binaries
reply
jeffbee
5 hours ago
[-]
Google contributed the code, and the entire concept, of DWARF fission to both GCC and LLVM. This suggests that rather than overlooking something obvious that they'll be embarrassed to learn on HN, they were aware of the issues and were using the solutions before you'd even heard of them.
reply
sionisrecur
5 hours ago
[-]
A case of the left hand not knowing what the right hand is doing?
reply
jeffbee
5 hours ago
[-]
There's no contradiction, no missing link in the facts of the story. They have a huge program, it is 2GiB minus epsilon of .text, and a much larger amount of DWARF stuff. The article is about how to use different code models to potentially go beyond 2GiB of text, and the size of the DWARF sections is irrelevant trivia.
reply
jcelerier
4 hours ago
[-]
> They have a huge program, it is 2GiB minus epsilon of .text,

but the article says 25+GiB including debug symbols, in a single binary?

also, I appreciate your enthusiasm in assuming that because some people do something in an organization, it is applied consistently everywhere. Hell, if it were microsoft other departments would try to shoot down the "debug tooling optimization" dpt

reply
loeg
4 hours ago
[-]
Yes, the 25GB figure in the article is basically irrelevant to the 2GB .text section concern. Most ELF files that size are 95%+ debuginfo.
reply
jeffbee
4 hours ago
[-]
ELF is just a container format and you can put literally anything into one of its sections. Whether the DWARF sections are in "the binary" or in another named file is really quite beside the point.
reply
forrestthewoods
11 hours ago
[-]
If you have 25gb of executables then I don’t think it matters if that’s one binary executable or a hundred. Something has gone horribly horribly wrong.

I don’t think I’ve ever seen a 4gb binary yet. I have seen instances where a PDB file hit 4gb and that caused problems. Debug symbols getting that large is totally plausible. I’m ok with that at least.

reply
niutech
3 hours ago
[-]
Llamafile (https://llamafile.ai) can easily exceed 4GB due to containing LLM weights inside. But remember, you cannot run >4GB executable files on Windows.
reply
wolfi1
9 hours ago
[-]
I did, it was a Spring Boot fat jar with a NLP, I had to deploy it to the biggest instance AWS could offer, the costs were enormous
reply
loeg
4 hours ago
[-]
If you haven't seen a 25GB binary with debuginfo, you just aren't working in large, templated, C++ codebases. It's nothing special there.
reply
forrestthewoods
2 hours ago
[-]
Not quite. I very much work in large, templated, C++ codebases. But I do so on windows where the symbols are in a separate file the way the lord intended.
reply
throwawaymobule
8 hours ago
[-]
A few ps3 games I've seen had 4GB or more binaries.

This was a problem because code signing meant it needed to be completely replaced by updates.

reply
swiftcoder
6 hours ago
[-]
> A few ps3 games I've seen had 4GB or more binaries.

Is this because they are embedding assets into the binary? I find it hard to believe anyone was carrying around enough code to fill 4GB in the PS3 era...

reply
throwawaymobule
49 minutes ago
[-]
I assume so, there were rarely any other files on the disc in this case.

It varied between games, one of the battlefields (3 or bad company 2) was what I was thinking of. It generally improved with later releases.

The 4GB file size was significant, since it meant I couldn't run them from a backup on a fat32 usb drive. There are workarounds for many games nowadays.

reply
yablak
10 hours ago
[-]
> We would like to keep our small code-model. What other strategies can we pursue?

Move all the hot BBs near each other, right?

Facebook's solution: https://github.com/llvm/llvm-project/blob/main/bolt%2FREADME...

Google's:

https://lists.llvm.org/pipermail/llvm-dev/2019-September/135...

reply
setheron
5 hours ago
[-]
but for x86_64, as of right now, if only a single call needs more than 31bits you have to upgrade the whole code section to large code model.

BOLT AFAIU is more about cache locality of putting hot code near each other and not really breaking the 2GiB barrier.

reply
jeffbee
4 hours ago
[-]
Why? Can't the linker or post-link optimizer reduce all near calls, leaving the more complicated mov with immediate form only where required?
reply
10000truths
10 hours ago
[-]
Debug symbol size shouldn't be influencing relocation jump distances - debug info has its own ELF section.

Regardless of whether you're FAANG or not, nothing you're running should require an executable with a 2 GB large .text section. If you're bumping into that limit, then your build process likely lacks dead code elimination in the linking step. You should be using LTO for release builds. Even the traditional solution (compile your object files with -ffunction-sections and link with --gc-sections) does a good job of culling dead code at function-level granularity.

reply
saagarjha
9 hours ago
[-]
Google Chrome ships as a 500 MB binary on my machine, so if you're embedding a web browser, that's how much you need minimum. Now tack on whatever else your application needs and it's easy to see how you can go past 2 GB if you're not careful. (To be clear, I am not making a moral judgment here, I am just saying it's possible to do. Whether it should happen is a different question.)
reply
throwawaymobule
7 hours ago
[-]
Do you have some special setup?

Chromium is in the hundred and something MB range on mine last I looked. Might expand to more on install.

reply
1vuio0pswjnm7
22 minutes ago
[-]
It was ~233M dynamically-linked, stripped binary not too long ago

Perhaps this shows how some developers may have few if any concerns about size constraints. The costs of this carefree attitude may then be passed on to end users

I'm using a 2.0M static binary to accomplish the same task as everyone else reading this, but without the colors, indentation or Javascript. Easy-to-read white text on black backgound, in textmode, no X11 or the like

I run static binaries from a rootfs mounted as tmpfs so size does matter and 2.0M is large for me

Would a RAM "shortage" cause more people to write more resource-conserving programs

reply
saagarjha
7 hours ago
[-]
I just checked Google Chrome Framework on my Mac, it was a little over 400 MB. Although now that I think about it it's probably a universal binary so you can cut that in half?
reply
trevor-e
4 hours ago
[-]
Yea looks like Chrome ships a universal binary with both x86_64 and arm64.
reply
yablak
9 hours ago
[-]
FAANGs we're deeply involved in designing LTO. See, e.g.,

https://research.google/pubs/thinlto-scalable-and-incrementa...

And other refs.

And yet...

reply
jeffbee
7 hours ago
[-]
Google also uses identical code folding. It's a pretty silly idea that a shop that big doesn't know about the compiler flags.
reply
Orphis
2 hours ago
[-]
Google is made of many thousands of individuals. Some experts will be aware of all those, some won't. In my team, many didn't know about those details as they were handled by other builds teams for specific products or entire domains at once.

But since each product in some different domains had to actively enable those optimizations for themselves, they were occasionally forgotten, and I found a few in the app I worked for (but not directly on).

reply
jeffbee
2 hours ago
[-]
ICF seems like a good one to keep in the box of flags people don't know about because like everything in life it's a tradeoff and keeping that one problematic artifact under 2GiB is pretty much the only non-debatable use case for it.
reply
meisel
7 hours ago
[-]
> Responses to my publication submissions often claimed such problems did not exist

I see this often even in communities of software engineers, where people who are unaware of certain limitations at scale will announce that the research is unnecessary

reply
loeg
4 hours ago
[-]
Sure! But there's a sleight of hand in the numbers here where we're talking about 25GB binaries with debuginfo and then 2GB maximum offsets in the .text section. Of those 25GB binaries, probably 24.5 of them are debuginfo. You have to get into truly huge binaries before >2GB calls become an issue.

(I wonder but have no particular insight into if LTO builds can do smarter things here -- most calls are local, but the handful of far calls can use the more expensive spelling.)

reply
benlivengood
41 minutes ago
[-]
At Google I worked with one statistics aggregation binary[0] that was ~25GB stripped. The distributed build system wouldn't even build the debug version because it exceeded the maximum configured size for any object file. I never asked if anyone had tried factoring it into separate pipelines but my intuition is that the extra processing overhead wouldn't have been worth splitting the business logic that way; once the exact set of necessary input logs are in memory you might as well do everything you need to them given the dramatically larger ratio of data size to code size.

[0] https://research.google/pubs/ubiq-a-scalable-and-fault-toler...

reply
stncls
12 hours ago
[-]
> The simplest solution however is to use -mcmodel=large which changes all the relative CALL instructions to absolute JMP.

Makes sense, but in the assembly output just after, there is not a single JMP instruction. Instead, CALL <immediate> is replaced with putting the address in a 64-bit register, then CALL <register>, which makes even more sense. But why mention the JMP thing then? Is it a mistake or am I missing something? (I know some calls are replaced by JMP, but that's done regardless of -mcmodel=large)

reply
loeg
3 hours ago
[-]
I think the author is just noting that the construction is similar to an 8-byte JMP instruction. The text now reads:

> The simplest solution however is to use -mcmodel=large which changes all the relative CALL instructions to absolute 64bit ones; kind of like a JMP.

(We still need to use CALL in order to push a return address.)

reply
dwattttt
8 hours ago
[-]
I would assume loose language, referring to a CALL as a JMP. However of the two reasons given to dislike the large code model, register pressure isn't relevant to that particular snippet.

It's performing a call, ABIs define registers that are not preserved over calls; writing the destination to one of those won't affect register pressure.

reply
wyldfire
7 hours ago
[-]
> What other strategies can we pursue?

You can use thunks/trampolines. lld can make them for some architectures, presumably also for x86_64. Though I don't know why it didn't in your case.

But, like the large code model it can be expensive to add trampolines, both in icache performance and just execution if a trampoline is in a particularly hot path.

reply
setheron
6 hours ago
[-]
In many ways that is what the PLT is also.

This is what my next post will explore. I ran into some issues with the GOT that I'll have to explore solutions for.

I'm writing this for myself mostly. The whole idea for code models when you have thunks feels unnecessary.

reply
setheron
3 hours ago
[-]
reply
wyldfire
2 hours ago
[-]
> With this information, the necessity of code-models feels unecessary [sic]. Why trigger the cost for every callsite when we can do-so piecemeal as necessary with the opportunity to use profiles to guide us on which methods to migrate to thunks.

Does the linker have access to the same hotness information that the compiler uses during PGO? Well -- presumably it could, even if it doesn't now. But it would be like a heuristic with a hotness threshold? Do linkers "do" heuristics?

reply
doubletwoyou
14 hours ago
[-]
25 GiB for a single binary sounds horrifying

at some point surely some dynamic linking is warranted

reply
nneonneo
13 hours ago
[-]
To be fair, this is with debug symbols. Debug builds of Chrome were in the 5GB range several years ago; no doubt that’s increased since then. I can remember my poor laptop literally running out of RAM during the linking phase due to the sheer size of the object files being linked.

Why are debug symbols so big? For C++, they’ll include detailed type information for every instantiation of every type everywhere in your program, including the types of every field (recursively), method signatures, etc. etc., along with the types and locations of local variables in every method (updated on every spill and move), line number data, etc. etc. for every specialization of every function. This produces a lot of data even for “moderate”-sized projects.

Worse: for C++, you don’t win much through dynamic linking because dynamically linking C++ libraries sucks so hard. Templates defined in header files can’t easily be put in shared libraries; ABI variations mean that dynamic libraries generally have to be updated in sync; and duplication across modules is bound to happen (thanks to inlined functions and templates). A single “stuck” or outdated .so might completely break a deployment too, which is a much worse situation than deploying a single binary (either you get a new version or an old one, not a broken service).

reply
yjftsjthsd-h
13 hours ago
[-]
Can't debug symbols be shipped as separate files?
reply
loeg
3 hours ago
[-]
Yes, absolutely. Debuginfo doesn't impact .text section distances either way, though.
reply
bregma
9 hours ago
[-]
The problem is that when a final binary is linked everything goes into it. Then, after the link step, all the debug information gets stripped out into the separate symbols file. That means at some point during the build the target binary file will contain everything. I can not, for example, build clang in debug mode on my work machine because I have only 32 GB of memory and the OOM killer comes out during the final link phase.

Of course, separate binaries files make no difference at runtime since only the LOAD segments get loaded (by either the kernel or the dynamic loader, depending). The size of a binary on disk has little to do with the size of a binary in memory.

reply
jcelerier
7 hours ago
[-]
> The problem is that when a final binary is linked everything goes into it

I don't think that's the case on Linux, when using -gsplit-dwarf the debug info is put in separate files at the object file level, they are never linked into binaries.

reply
yablak
9 hours ago
[-]
Yes, but it can be more of a pain keeping track of pairs. In production though, this is what's done. And given a fault, the debug binary can be found in a database and used to gdb the issue given the core. You do have to limit certain online optimizations in order to have useful tracebacks.

This also requires careful tracking of prod builds and their symbol files... A kind of symbol db.

reply
tempay
13 hours ago
[-]
I’ve seen LLVM dependent builds hit well over 30GB. At that point it started breaking several package managers.
reply
01HNNWZ0MV43FF
13 hours ago
[-]
I've hit the same thing in Rust, probably for the same reasons.

Isn't the simple solution to use detached debug files?

I think Windows and Linux both support them. That's how phones like Android and iOS get useful crash reports out of small binaries, they just upload the stack trace and some service like Sentry translates that back into source line numbers. (It's easy to do manually too)

I'm surprised the author didn't mention it first. A 25 GB exe might be 1 GB of code and 24 GB of debug crud.

reply
nicoburns
6 hours ago
[-]
> Isn't the simple solution to use detached debug files?

It should be. But the tooling for this kind of thing (anything to do with executable formats including debug info and also things like linking and cross-compilation) is generally pretty bad.

reply
dwattttt
9 hours ago
[-]
> I think Windows and Linux both support them.

Detached debug files has been the default (only?) option in MS's compiler since at least the 90s.

I'm not sure at what point it became hip to do that around Linux.

reply
kvemkon
1 hour ago
[-]
Since at least October 2003 on Debian:

[1] "debhelper: support for split debugging symbols"

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=215670

[2] https://salsa.debian.org/debian/debhelper/-/commit/79411de84...

reply
0xbadcafebee
13 hours ago
[-]
To be fair, they worked at Google, their engineering decisions are not normal. They might just decide that 25 GiB binaries are worth a 0.25% speedup at start time, potentially resulting in tens of millions of dollars' worth of difference. Nobody should do things the way Google does, but it's interesting to think about.
reply
dilyevsky
1 hour ago
[-]
Won't make a bit of difference because everything is in a sort of container (not Docker) anyway. Unless you're suggesting those libraries to be distributed as base image to every possible Borg machine your app can run on which is an obvious non-starter.
reply
flohofwoe
10 hours ago
[-]
The overall size wouldn't get smaller just because it is dynamically linked, on the contrary (because DLLs are a dead code elimination barrier). 25 GB is insane either way, something must have gone horribly wrong very early in the development process (also why, even ship with debug information included, that doesn't make sense in the first place).
reply
shevy-java
8 hours ago
[-]
25GB seems excessive, but I keep on having the basic compile toolchain as statically compiled executables. It simply works better when things go awry.
reply
reactordev
4 hours ago
[-]
Oh man, that first paragraph. “Such problems don’t exist…” what a gaslighting response to a publication submittal. The least they could do is ask where this problem emerges and you can hand wavy your answer without revealing business IP.

Also, we, as an industry of software engineers, need to re-examine these hard defaults we thought could never be achieved. Such as the .text limits.

Anyway, very good read.

reply
a_t48
13 hours ago
[-]
I've seen terrible, terrible binary sizes with Eigen + debug symbols, due to how Eigen lazy evaluation works (I think). Every math expression ends up as a new template instantiation.
reply
forrestthewoods
11 hours ago
[-]
Eigen is one of the worst libraries when it comes to both exe size and compile times. <shudder>
reply
a_t48
2 hours ago
[-]
In terms of compile times, boost geometry is somehow worse. You're encouraged to import boost/geometry.hpp, which includes every module, which stalls compile times by several seconds just to parse all the templates. It's not terrible if you include just the headers you need, but that's not the "default" that most people use.
reply
forrestthewoods
20 minutes ago
[-]
boost is on my “do not ever use ever oh my god what are you doing stop it” list. It’s so bad.
reply
nicebyte
2 hours ago
[-]
shameless plug: if you want to understand the content of this post better, first read the first half of my article on jumps [1] (up to syscall). goes into detail about relocations and position-independent code.

[1] https://gpfault.net/posts/asm-tut-4.html

reply
gerikson
14 hours ago
[-]
The HN de-sensationalize algo for submission titles needs tweaking. Original title is simply "Huge Binaries".
reply
acosmism
12 hours ago
[-]
agreed. Binaries is a bit too sensational for my taste. this can be further optimized.
reply
fuzzfactor
6 hours ago
[-]
"Files So Big They Might As Well Be Trinaries".
reply
binaryturtle
11 hours ago
[-]
"Bins"? :)
reply
bayindirh
10 hours ago
[-]
01.

Why not?

reply
DHRicoF
9 hours ago
[-]
False
reply