FilterHN

1 month ago

[-]

In the end it all boils to a very simple argument. The C programmers want the C compilers to behave one way, the C implementers want the C compilers to behave the other way. Since the power structure is what it is — the C implementers are the ones who write the C standard and are the ones who actually get to implement the C compilers — the C compilers do, and will, behave the way the C implementers want them to.

In this situation the C programmers can either a) accept that they're programming in a language that exists as it exists, not as they'd like it to exist; b) angrily deny a); or c) switch to some other system-level language with defined semantics.

userbinator

1 month ago

[-]

Given what most C compilers are written in, are C programmers also C implementers?

I suspect it also depends on who exactly the compiler writers are; the GCC and LLVM guys seem to have more theoretics/academics and thus think of the language more abstractly, leading to UB being truly inexplicable and free of thought, while MSVC and ICC are more on the practical side and their interpretation of it is, as the standard says, "in a documented manner characteristic of the environment". IMHO the "spirit of C" and the more commonsense approach is definitely the latter, and K&R themselves have always leaned in that direction. This is very much a "letter of the law vs. spirit of the law" argument. The fact that these two different sides have produced compilers with nearly the same performance characteristics shows IMHO that the argument of needing to exploit UB is mandatory for performance is a debunked myth.

1 month ago

[-]

I doubt it, but that's just a hunch. Is there data out there regarding compiler/language maintainer/standards committee members' contributions to other projects (beyond "so and so person works on $compiler and $application, both written in C"-type anecdotes)?

If not, then, like ... sure, C compiler maintainers people who program in C, but they're not "C programmers" as it was intended (people who develop non-compiler software in C).

My hunch is that that statement is overwhelmingly true if measured by influence of a given C compiler/implementation stack (because GCC/LLVM/MSVC take up a huge slice of the market, and their maintainers are in many cases paid specialists who don't do significant work on other projects), but untrue if measured by count of people who have worked on C compilers (because there are a huge number of small-market-share/niche compilers out there, often maintained by groups who develop those compilers for a specific, often closed-source, platform/SoC/whatever).

HexDecOctBin

1 month ago

[-]

Another alternative is that the programmer write their own C compiler and be free of this politics. Maybe I am biased since I am working on exactly such a project, but I have been seeing more and more in-progress compiler implementations for C or C-like languages for the past couple years.

[0] https://blog.regehr.org/archives/1287

1 month ago

[-]

The proposals for Boring C or "Friendly Dialect of C" or whatever has been around for a while. None went beyond the early design stages because, it turns out, no two experienced C programmers could agree on what parts of C are reasonable/unreasonable (and should be kept/left out), see [0] for the first-hand recount.

> In contrast, we want old code to just keep working, with latent bugs remaining latent.

Well, just keep compiling it with the old compilers. "But we'd like to use new compilers for some 'free' gains!" Well, sucks, you can't. "But we have to use new compilers because the old ones just plain don't work on the newer systems!" Well, that sucks, and this here is why "technical debt" is called "debt" and you've managed to hold paying it off until now the repo team is here and knocking at your door.

1 month ago

[-]

I can't upvote this enough.

I mostly work in compiled languages now, but started in interpreted/runtime languages.

When I made that switch, it was baffling to me that the compiled-language folks don't do compatibility-breaking changes more often during big language/compiler revision updates.

Compiled code isn't like runtime code--you can build it (in many cases bit-deterministically!) on any compiler version and it stays built! There's no risk of a toolchain upgrade preventing your software from running, just compiling.

After having gone through the browser compatibility trenches and the Python 2->3 wars, I have no idea why your proposal isn't implemented more often: old compiler/language versions get critical/bugfix updates where practical, new versions get new features and aggressively deprecate old ones. For example: "you want some combination of {the latest optimizations, loongarch support, C++-style attributes, #embed directives, auto vector zero-init}? Great! Those are only available on the new revision of the compiler where -Werror is the default and only behavior. Don't want those? The old version will still get bugfixes."

Don't get me wrong, backwards compatibility is golden...when it comes to making software run. But I think it's a mistake that back compat is taken even further when it comes to compilers, rather than the reverse. I get that there are immense volumes of C/C++ out there, but I don't get why new features/semantics/optimizations aren't rolled out more aggressively (well, I do--maintainers of some of those immense volumes are on language steering committees and don't want to spin up projects to modernize their codebases--but I'm mad about it).

"Just use an old compiler" seems like such a gimme--especially in the modern era of containers etc. where making old toolchains available is easier than ever. I get that it feels bad and accumulates paper cuts, but it is so much easier to deploy compiled code written on an old revision on a new system than it is to deploy interpreted/managed code.

(There are a few cases where compilers need to be careful there--thinking about e.g. ELF format extensions and how to compile code with consideration for more aggressive linker optimizations that might be developed in the future--but they're the minority.)

HexDecOctBin

1 month ago

[-]

There are C codebases many decades old still being actively maintained and used. I don't think the same is true for Python on the same scale. It's easy to remodel when you are on the top of abstraction layer, but you don't want to mess around with the foundational infrastructure unnecessarily.

1 month ago

[-]

Absolutely. But there’s so much more liberty in C land in that you can stay on an old compiler/language version for such codebases.

I know it’s not pleasant per se, but the level of support needed (easier now with docker and better toolchain version management utils than were the norm previously) surely doesn’t merit compilers carrying around the volume of legacy cruft and breaking-change aversion they do, no?

1 month ago

[-]

And please provide feedback to WG14. Also please give feedback and file bugs for GCC / clang. There are users of C in the committee and we need your support. Also keeping C implementable for small teams is something that is at risk.

1 month ago

[-]

Myself and other developers I know have tried giving feedback for gcc. On the whole, going outside and shouting at clouds is more productive.

1 month ago

[-]

I felt the same. There are too few contributors for GCC. At some point I started to fix the bugs that I had filed myself. Still, it is important that the user make themselves heard.

1 month ago

[-]

I think it's a circular problem, the gcc developers are very insular and respond to outside input with anything from ignoring it to getting into long lawyeristic arguments why, if you squint at the text just right, their way is the only right way, which strongly discourages outside contributions. There's only so many hours in the day and arguing till you're blue in the face that silently mutating a piece of code into unexpected different code that always segfaults when run based on a truly tortured interpretation of two sentences of text gets old fast. The gcc devs would make great lawyers for bypassing things like environmental law, they'd find some tortuous interpretation of an environmental protection law that let them dump refinery waste into a national park and then gleefully do it because their particular interpretation of the law didn't prohibit it.

Contrast this with Linus' famous "we do not break userspace" rant which is the polar opposite of the gcc devs "we love to break your code to show how much cleverererer than you we are". Just for reference the exact quote, https://lkml.org/lkml/2012/12/23/75, is:

  And you *still* haven't learnt the first rule of kernel maintenance?  If a change results in user programs breaking, it's a bug in the kernel. We never EVER blame the user programs. How hard can this be to understand?  ... WE DO NOT BREAK USERSPACE!

Ah, Happy Fun Linus. Can you imagine the gcc devs ever saying "if we break your code it's a problem with gcc" or "we never blame the user?".

This really seems to be gcc-specific problem. It doesn't affect other compilers like MSVC, Diab, IAR, Green Hills, it's only gcc and to a lesser extent clang. Admittedly this is from a rather small sample but the big difference between those two sets that jumps out is that the first one is commercial with responsibilities to customers and the second one isn't.

1 month ago

[-]

In my experience it is worse with clang that even more aggressively uses UB than GCC to optimize (and Chris Lattner in his famous blog post very much justified this line of thinking), and I have seen similar things with MSCV. I do not know about the others.

I think that GCC changed a bit in recent years, but I am also not sure that an optimizing compiler can not have the same policy as the kernel. For the kernel, it is about keeping API's stable which is realistic, but an optimizing compiler inherently relies on some semantic interpretation of the program code and if there is a mismatch that causes something to break it is often difficult to fix. It is also that many issues were not caused because they decided suddenly "let's now exploit this UB we haven't exploited before" but that they always relied on it but an improved optimization now makes something affect more or different program. This creates a difficult situation because it is not clear how to fix it if you don't want to roll back the improvement you spend a lot of time on and others paid for. Don't get me wrong, I agree the went to far in the past in exploiting UB, but I do think this is less of a problem when looking forward and there is also generally more concern about the impact on safety and security now.

1 month ago

[-]

Good point, yeah. I really want to like clang because it's not gcc but they have been following the gcc path a lot in recent years. I haven't actually seen it with MSVC, but I'm still on an old pre-bloat version of Visual Studio so maybe they've got worse in recent versions too.

I think a lot of the UB though isn't "let's exploit UB", it's "we didn't even know we had UB in the code". An example is twos-complement arithmetic, which the C language has finally acknowledged more than half a century after the last non-twos-complement machine was built (was the CDC 6600 the last one's-complement machine? Were most of the gcc dev even born when that was released?). So everyone on earth has been under the crazy notion that their computer used twos-complement maths which the gcc (and clang) devs know is actually UB and allows them to do whatever they want with your code when they encounter it.

seg_lol

1 month ago

[-]

How about we agree on the ABI and everyone can have their own C compiler. Everyone C's the world through their own lenses.

1 month ago

[-]

We're not too far away from that. At the very least, Claude can provide feedback and help decide which compiler options to use, as per developer preference.

WalterBright

1 month ago

[-]

> behave the way the C implementers want them to

If you don't please your users, you won't have any users.

Jweb_Guru

1 month ago

[-]

It's ironic that I have to tell you of all people this, but many users of C (or at least, backends of compilers targeted by C) do actually want the compiler to aggressively optimize around UB.

WalterBright

1 month ago

[-]

I'm well aware of that. We've had many, many discussions of that in the D forums.

Gibbon1

1 month ago

[-]

Consider that most programmers have long since fled for other languages.

groestl

1 month ago

[-]

If you're self hosting your compiler on C, you are your own user.

godelski

1 month ago

[-]

Which users?

AlotOfReading

1 month ago

[-]

And yet, C++.

locknitpicker

1 month ago

[-]

> And yet, C++.

By any metric, C++ is one of the most successful programming languages devised by mankind, if not the most successful.

What point were you trying to make?

1 month ago

[-]

That it doesn't pleases lots of its users I imagine. I, personally, certainly never enjoyed it but sometimes you don't have a realistic alternative and have to use C++ (or C). In which case your pleasure or displeasure doesn't really matter, you just use that one tool with very sharp edges in the most unexpected (and ridiculously exposed) places with as much care as you could, then bandage your wounds and move on.

zem

1 month ago

[-]

that it has millions of users while pleasing approximately none of them

1 month ago

[-]

True! But C++ is popular almost entirely because of when (in history/what alternatives existed at the time) and where (on what platforms) it first became available, and how much adoption momentum was created during that era.

I think claiming that C++ is successful because of the unintuitive-behavior-causing compiler behaviors/parts of the spec is an extraordinary claim--if that's what you mean, then I disagree. TFA discusses that many of the most pernicious UB-causing optimizations yield paltry performance gains.

WalterBright

1 month ago

[-]

If I may pontificate a bit, I was a major contributor to the success of C++.

Back in the 80s, I was looking for a way to enhance my C compiler. I looked at Objective-C and C++. There was a newsgroup for each, and each had about the same amount of traffic. I had to pick one.

Objective-C required a license to implement it. I asked AT&T if I needed a license to implement C++, and could I call it C++. AT&T's lawyer laughed and said feel free to do whatever you want.

So that decided it for me. At the time, C++ did not exist on the PC other than the awkward, nearly unusable cfront (which translated C++ to C). At the time, 90% of programming was done on the PC.

I implemented it. It was the first native C++ compiler for the PC. (It is arguable that it was the first native C++ compiler, depending on whether a gcc beta is considered a release.)

The usage of it exploded. The newsgroup traffic for C++ zoomed upwards, and Objective-C interest fell away. C++ built critical mass because of Zortech C++.

Borland dropped their plans for an OOP language and went for Turbo C++. Microsoft also had a secret OOP C language called C*, which was also abandoned in favor of implementing C++.

And the rest is history!

P.S. cfront on the PC was unusable because it was 1) incredibly slow and 2) did not support near/far pointers which was required for the mixed PC memory models.

P.P.S. Bjarne Stroustrup never mentioned any of this in his book "The Design and Evolution of C++".

https://godbolt.org/z/EYxWqcfjx

nananana9

1 month ago

[-]

Compiler developers hijacked and twisted the term "Undefined Behavior". Everyone understood what UB was in K&R C - if you write code that the standard doesn't define a meaning to, and the compiler outputs what it outputs. If you dereference a null pointer, the compiler outputs a null pointer dereference, and when you hit it at runtime you get the undefined behavior (page fault on modern systems).

Nowadays, UB means something completely different - if at any point in time, the compiler reasons out that a piece of code is only reachable via UB, it will assume that this can never happen, and will quietly delete everything downstream:

kace91

1 month ago

[-]

Sorry if I’m missing something as this isn’t my field, but shouldn’t the two meanings be roughly equivalent to the user?

As in, everything down from UB is only working by an accident of implementation that does not need to hold, and you should explicitly not rely on that. Whether the compiler happens to explicitly make it not ever work or just leaves it to fate should not be relevant.

Doxin

1 month ago

[-]

No, because the former definition is still something you can rely on given a specific compiler and a specific machine. Hell a bunch of UB was pretty much universal anyway. Compilers would usually still emit sensible code for UB.

UB just ment "the spec doesn't define what happens". It didn't use to mean "the compiler can just decide to do any wild thing if your program touches UB anywhere at anytime". Hell, with the modern definition UB can aparantly time travel. you don't even need to execute UB code for it to start doing weird shit in some cases.

UB went from "whatever happens when your compiler/hardware runs this is what happens" to "Once a program contains UB the compiler doesn't need to conform to the rest of the spec anymore."

kace91

1 month ago

[-]

>the former definition is still something you can rely on given a specific compiler and a specific machine.

>UB just ment "the spec doesn't define what happens"

What comes to mind is that then the written code is operating on a subspec, one that is probably undocumented and maybe even unintended by the specifics of that version and platform.

It sounds like it could create a ton of issues, from code that can’t be ported to difficulty in other person grokking the undocumented behavior that is being used.

In this regard, as someone that could potentially inherit this code I’d actually want the compiler to stop this potential behavior. Am I missing something? Is the spec not functional enough on its own to rely just on that?

xscott

1 month ago

[-]

Very simple code is UB:

    int handle_untrusted_numbers(int a, int b) {
        if (a < 0) return ERROR_EXPECTED_NON_NEGATIVE;
        if (b < 0) return ERROR_EXPECTED_NON_NEGATIVE;
        int sum = a + b;
        if (sum < 0) {
            return ERROR_INTEGER_OVERFLOW;
        }
        return do_something_important_with(sum);
    }

Every computer you will ever use has two's complement for signed integers, and the standard recently recognized and codified this fact. However, the UB fanatics (heretics) insisted that not allowing signed overflow is an important opportunity for optimizations, so that last if-statement can be deleted by the compiler and your code quietly doesn't check for overflow any more.

There are plenty more examples, but I think this is one of the simplest.

Doxin

1 month ago

[-]

I'm not opposed to compilers erroring out on UB. But that's not what happens. Instead of choosing to either proceed and hope all is well, or choosing to stop and error out, compilers instead take the secret third option of breaking your code even more and telling no one.

1 month ago

[-]

One thing you need to add is that UB can be incredibly subtle and almost impossible to spot even by people with decades of programming experience. However, the compiler - and we're talking almost exclusively gcc here - will spot it and silently break your code. It won't warn "hey, I've spotted UB here!" even with every possible warning enabled, it will just quietly break your code without giving you any indication that it's done so.

It's some of the most user-hostile behavior I've ever encountered in an application.

anilakar

1 month ago

[-]

My main gripe with UB is that if a compiler is able to detect undefined behavior invocation, it is still allowed to compile (or rather omit) said code instead of crashing.

[1] https://youtu.be/yG1OZ69H_-o?si=x-9ALB8JGn5Qdjx_&t=2357 [2] https://ziglang.org/documentation/0.15.2/#Operators

atiedebee

1 month ago

[-]

ISO C99 actually defines multiple types of deviating behaviour. What you're describing is closer to implementation-defined behaviour than anything else.

The three behaviours relevant in this discussion, from section 3.4:

  3.4.1 implementation-defined behavior
  unspecified behavior where each implementation documents how the choice is made
  EXAMPLE An example of implementation-defined behavior is the propagation of the high-order bit when a signed integer is shifted right.

  3.4.3 undefined behavior
  behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
  Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
  An example of undefined behavior is the behavior on integer overflow.

  3.4.4 unspecified behavior
  behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance
  An example of unspecified behavior is the order in which the arguments to a function are evaluated.

K&R seems to also mention "undefined" and "implementation-defined" behaviour on several occasions. It doesn't specify what is meant by undefined behaviour, but it does indeed seem to be "whatever happens, happens" instead of "you can do whatever you want." But ISO C99 seems to be a lot looser with their definition.

Using integer overflow, as in your example, for optimization has been shown to be beneficial by Charles Carruth in a talk he did at CppCon in 2016.[1] I think it would be best to have something similar to Zig's wrapping and saturating addition operators instead, but for that I think it is better to just use Zig (which I personally am very willing to do once they reach 1.0 and other compiler implementations are available).[2]

nananana9

1 month ago

[-]

[1] is probably the best counterpoint I've seen, but there are other ways to enable this optimization - the most obvious being to use a register-sized index, which is what's passed to the function anyways. I'd be fine with an intrinsic for this as well (I don't think you'll use it often enough to justify the +%= syntax)

It's also worth noting that even with the current very liberal handling of UB, the actual code sample in [1] was still lacking this optimization; so it's not like the liberal UB handling automatically lead to faster code, understanding of the compiler was still needed.

The question is one of risk - if the compiler is conservative, you're risking is a slightly more unoptimized code. If the compiler is very liberal and assumes UB never happens, you're risking that it will wipe your overflow check like in my godbolt (I've seen an actual CVEs due to that, although I don't remember the project)

ameliaquining

1 month ago

[-]

Previously:

https://news.ycombinator.com/item?id=11219874 (2016)

https://news.ycombinator.com/item?id=19659555 (2019)

dang

1 month ago

[-]

Thanks! Macroexpanded:

What every compiler writer should know about programmers (2015) [pdf] - https://news.ycombinator.com/item?id=19659555 - April 2019 (62 comments)

What every compiler writer should know about programmers [pdf] - https://news.ycombinator.com/item?id=11219874 - March 2016 (106 comments)

https://www.yodaiken.com/2021/05/19/undefined-behavior-in-c-...

1 month ago

[-]

Here's a cogent argument that any decision by compiler writers that they can do whatever they wish whenever they encounter an "undefined behavior" construct is rubbish:

And here's a cautionary tale of how a compiler writer doing whatever they wish once they encounter undefined behavior makes debugging intractable:

https://www.quora.com/What-is-the-most-subtle-bug-you-have-h...

deathanatos

1 month ago

[-]

> undefined behavior makes debugging intractable:

By their own admission, the compiler warns about the UB. "-Wanal"¹, as some call it, makes it an error. Under UBSan the program aborts with:

  code.cpp:4:6: runtime error: execution reached the end of a value-returning function without returning a value

… "intractable"?

¹a humorous name for -Wextra -Wall -Werror

1 month ago

[-]

Not everybody has full control over their environment.

The -Werror flag is not even religiously used for building, e.g. the linux kernel, and -Wextra can introduce a lot of extraneous garbage.

This will often make it easier (though still difficult) to winnow the program down to a smaller example, as that person did, rather than to enable everything and spend weeks debugging stuff that isn't the actual problem.

1 month ago

[-]

Yes, this is the funny thing. People do not want to spend time using a stricter language as already supported by C compilers using compiler flags because "it is waste of time" while others argue that we need to switch to much stricter languages. Both positions can not be true at the same time.

1 month ago

[-]

Wow, that's a very torturous reading of a specific line in a standard. And it doesn't really matter what Yodaiken thinks this line means because standard is written by C implementers for (mostly) C implementers. So if C compile writers think this line means they can use UB for optimizing purposes, then that's what it means.

Yeah, I know it breaks the common illusion among the C programmers that they're "close to the bare metal", but illusions should be dispersed, not indulged. The C programmers program for the abstract C machine which is then mediated by the C compilers into machine code the way the implementers of C compilers publicly documented.

chc4

1 month ago

[-]

Yeah, this is basically Sovereign Citizen-tier argumentation: through some magic of definitions and historical readings and arguing about commas, I prove that actually everyone is incorrect. That's not how programming languages work! If everyone for 10+ years has been developing compilers with some definition of undefined behavior, and all modern compilers use undefined behavior in order to drive optimization passes which depend on those invariants, there is no possible way to argue that they're wrong and you know the One True C Programming Language interpretation instead.

Moreover, compiler authors don't just go out maliciously trying to ruin programs through finding more and more torturous undefined behavior for fun: the vast majority of undefined behavior in C are things that if a compiler wasn't able to assume were upheld by the programmer would inhibit trivial optimizations that the programmer also expects the compiler to be able to do.

somat

1 month ago

[-]

I find where the argument gets lost is when undefined behavior is assumed to be exactly that, an invariant.

That is to say, I find "could not happen" the most bizarre reading to make when optimizing around undefined behavior "whatever the machine does" makes sense, as does "we don't know". But "could not happen???" if it could not happen the spec would have said "could not happen" instead the spec does not know what will happen and so punts on the outcome, knowing full well that it will happen all the time.

The problem is that there is no optimization to make around "whatever the hardware does" or "we have no clue" so the incentive is to choose the worst possible reading "undefined behavior is incorrect code and therefore a correct program will never have it".

fluoridation

1 month ago

[-]

Some behaviors are left unspecified instead of undefined, which allows each implementation to choose whatever behavior is convenient, such as, as you put it, whatever the hardware does. IIRC this is the case in C for modulo with both negative operands.

I would imagine that the standard writers choose one or the other depending on whether the behavior is useful for optimizations. There's also the matter that if a behavior is currently undefined, it's easy to later on make it unspecified or specified, while if a behavior is unspecified it's more difficult to make it undefined, because you don't know how much code is depending on that behavior.

1 month ago

[-]

But even integer overflow is undefined.

It's practically impossible to find a program without UB.

1 month ago

[-]

I think this is not really true. Or rather, it depends on the UB you are talking about. There is UB which is simply UB because it is out-of-scope for the C standard, and there is UB such as signed integer overflow that can cause issues. It is realistic to deal with the later, e.g. by converting them to traps with a compiler flags.

1 month ago

[-]

> I think this is not really true. Or rather, it depends on the UB you are talking about.

I mean, if you're going to argue that a compiler can do anything with any UB, then by all means make that argument.

Otherwise, then no, I don't think it's reasonable for a compiler to cause an infinite loop inside a function simply because that function itself doesn't return a value.

fluoridation

1 month ago

[-]

When you say "cause", do you mean insert on purpose, or do you mean cause by accident? I could see the latter happening, for example because the compiler doesn't generate a ret if the non-void function doesn't return anything, so control flow falls through to whatever code happens to be next in memory. I'm not aware of any compiler that does that, but it's something I could see happening, and the developers would have no reason to "fix" it, because it's perfectly up to spec.

https://www.quora.com/What-is-the-most-subtle-bug-you-have-h...

1 month ago

[-]

According to the author of the second link I gave (here it is again):

The problem was that the loop itself was altered, rather than that the function returned and then that somehow caused an infinite loop.

> I'm not aware of any compiler that does that, but it's something I could see happening, and the developers would have no reason to "fix" it, because it's perfectly up to spec.

This is where we disagree.

1 month ago

[-]

I am not sure what statement you are responding to. I am certainly not arguing that. I disagree with your claim that "it is practically impossible find a program without UB".

https://people.csail.mit.edu/nickolai/papers/wang-stack.pdf

1 month ago

[-]

A study found that, for a particular subset of UB (code that had legal, detectable behavior changes at differing optimization levels), 40% of Debian Wheezy packages exhibited this UB.

I submit that that's a small fraction of UB, that much of it would exist at any optimization level.

1 month ago

[-]

I know, but this still leaves 60% of programs without such UB which is far from "it is practically impossible find a program without UB". Also this this was a study from 2013 and many of those bugs found were fixed. Also GCC got UBSan in 2013 (so after this study).

1 month ago

[-]

That's "UB that was detected in this study". Since gcc will silently break code when it detects UB and you can't tell until you hit that specific case, the 40% is a lower bound. In practice it could be anything up to the full 100%.

1 month ago

[-]

In theory. But most C programs do not rely on UB. What is the basis for your claim?

1 month ago

[-]

Uhh... mathematics and logic? Since there's no perfect UB detector, one that detects UB in 40% of programs can only be presenting a lower bound. And I don't know why you think C programs rely on UB, they have it present without the programmer knowing about it.

1 month ago

[-]

It follows from mathematics and logic that "larger than 40%" could be 100%, but it does not follow that this is likely or reasonable to assume.

twoodfin

1 month ago

[-]

Aliasing being the classic example. If code generation for every pointer dereference has to assume that it’s potentially aliasing any other value in scope, things get slow in a hurry.

1 month ago

[-]

> Wow, that's a very torturous reading of a specific line in a standard.

It's actually a much more torturous reading to say "if any line in the program contains undefined behavior (such as the example given in the standard, integer overflow), then it's OK for the compiler to treat the entire program as garbage and create any behavior whatsoever in the executable."

Which is exactly what had been claimed, that he was addressing.

AlotOfReading

1 month ago

[-]

Compiler writers are free to make whatever intentional choices they want and document them. UB is especially nasty compared to other kinds of bugs because implementors can't/refuse to commit to any specific behavior, not because they've chosen the wrong behaviors.

1 month ago

[-]

> Compiler writers are free to make whatever intentional choices they want and document them.

Sure, but it's unlikely it's an intentional choice to cause an infinite loop simply because your boolean function didn't return a boolean.

Animats

1 month ago

[-]

Note that all the examples come from lack of bounds checking.

1 month ago

[-]

Making C compilers better and more predictable is impossible with so many UB cases listed in the standard. A better language should be used instead, where UB and implementation-defined behavior cases are minimized.

1 month ago

[-]

Where there is UB in the standard it means that a C compiler is free to define the behavior. So of course, somebody could write a C implementation which does this. See also Fil-C for a perfectly memory safe version of C. So the first sentence makes no sense.

But also note that there is an ongoing effort to remove UB from the standard. We have eliminated already about 30% of UB in the core language for the upcoming version C2Y.

1 month ago

[-]

C is designed in such a way, that designing a safe compiler without big performance penalties isn't possible. How much Fil-C is slower compared to something like GCC? 2 to 5 times slower?

1 month ago

[-]

This is only relevant for specific types of UB, and even there it is not entirely clear. One of the main challenges is ABI compatibility and separate compilation. Both are not necessarily part of the "design of C". If you are willing to give this up, a lot can be done. Annotations are another possibility to get full memory safety without performance cost.

1 month ago

[-]

Have you seen Rust? I'm loving it.

1 month ago

[-]

Rust is not super appealing to me as C user: too complex, slow compilation, etc.

1 month ago

[-]

Slow compilation and complexity isn't an issue. It's a price for much better result code quality and elimination of many errors.

mfru

1 month ago

[-]

Maybe Zig, Hare or C3 then?

1 month ago

[-]

Also what I like about C is that is has mature tooling, very portable with multiple implementations, and that is is very stable. I would not use a language for any serious project hat does not offer all this.

Honestly, I do not think that the problem is C is o big that one needs to jump ship. There are real issues, yes, but there are also plenty of good tools and strategies to deal with UB, it is not really an issue for me.

1 month ago

[-]

Zig isn't a language I mean. It's still full of footguns. It doesn't address fundamental reliability issues of C.

mfru

1 month ago

[-]

I was responding to uecker, who had other preferences

1 month ago

[-]

You are correct about the complexity. I write it with significant assistance from an LLM. Not quite vibe coding, but close. But I'm coming from Python, not C.

rurban

1 month ago

[-]

This was 2015, and we still have no -Wdeadcode, warning of removal of "dead code", ie what compilers think of dead code. If a program writer writes code, it is never dead. It is written. It had purpose. If the compiler thinks this is wrong, it needs to warn about it.

The only dead code is generated code by macros.

BeeOnRope

1 month ago

[-]

Dead code is extremely common in C or C++ after inlining, other optimizations.

1 month ago

[-]

Or stubs. I'll often flesh out a class before implementing the methods.

duped

1 month ago

[-]

If dead code (1) is common in your codebase then your code base is missing heaps of refactors

(1) "dead" meaning unused types, unreachable branches

muldvarp

1 month ago

[-]

Not really, no. If you use a regex library it is very likely that 80% of that code is effectively dead code.

duped

1 month ago

[-]

public interfaces are not dead code

catlifeonmars

1 month ago

[-]

OP means that the code has a dual purpose: one purpose is to be compiled, the other is to communicate structure or intent to programmers.

deathanatos

1 month ago

[-]

Do we know that? I've written "dead" code. It's point was to communicate structure or intent, but it was also still dead. This pattern, in one form or another, crops up a lot IME (in multiple languages, even, with varying abilities to optimize it):

  if condition that is "always" false:
    abort with message detailing the circumstances

That `if` is "dead", in the sense that the condition is always false. But "dead" sometimes is just a proof — or if I'm not rigourous enough, an assumption — in my head. If the compiler can prove the same proof I have in my head, then the dead code is eliminated. If can't, well, presumably it is left in the binary, either to never be executed, or to be executed in the case that the proof in my head is wrong.

guenthert

1 month ago

[-]

What about assertions that are meant to detect bad hardware? I'd think that's not too uncommon, particularly in shops building their own hardware. Noise on the bus, improper termination, ESD, dirty clock signal, etc. -- there are a million reasons why a bit might flip. I wouldn't want the compiler to optimize "obviously wrong" code out anymore then empty loops.

deathanatos

1 month ago

[-]

I think if you're in a language that's doing constant-propagation optimizations, you work around that in one of two ways:

1. you drop down to assembly.

2. you use functions that are purpose built to be sequence points the optimizer won't optimize through. E.g., in Rust, for the case you mention, `read_volatile`.

In either case, this gives the human the same benefit the code is giving the optimizer: an explicit indication that this code that might appear to be doing nothing isn't.

pmontra

1 month ago

[-]

Some conditions depend strictly on inputs and the compiler can't reason much about them, and the developers can't be sure about what their users will do. So that pattern is common. It's a sibling of assertions.

There are even languages with mandatory else branch.

somat

1 month ago

[-]

That's the problem

BeeOnRope

1 month ago

[-]

Why is that a problem? Inlining and optimization aren't minor aspects of compiling to native code, they are responsible for order-of-magnitude speedups.

My point is that it is easy to say "don't remove my code" while looking at a simple single-function example, but in actual compilation huge portions of a function are "dead" after inlining, constant propagation and other optimizations: not talking anything about C-specific UB or other shenanigans. You don't want to throw that out.

somat

1 month ago

[-]

Apologies for the flippant one liner, You made a good point and deserve more than that.

On the one hand, having the optimizer save you from your own bad code is a huge draw, this is my desperate hope with SQL, I can write garbage queries and the optimizer will save me from myself.

But... Someone put that code there, spent time and effort to get that machinery into place with the expectation that it is doing something. and when the optimizer takes that away with no hint. That does not feel right either. Especially when the program now behaves differently when "optimized" vs unoptimized.

BeeOnRope

1 month ago

[-]

What I mean is that we look at a function in isolation and see that it doesn't have any "dead code", e.g.,:

  int factorial(int x) {
    if (x < 0) throw invalid_input();
    // compute factorial ...
  }

This doesn't have any dead code in a static examination: at compilation-time, however, this function may be compiled multiple times, e.g., as factorial(5) or factorial(x) where x is known to be non-negative by range analysis. In this case, the `if (x < 0)` is simply pruned away as "dead code", and you definitely want this! It's not a minor thing, it's a core component of an optimizing compiler.

This same pruning is also responsible for the objectionable pruning away of dead code in the examples of compilers working at cross-purposes to programmers, but it's not easy to have the former behavior without the latter, and that's also why something like -Wdead-code is hard to implement in a way which wouldn't give constant false-positives.

rurban

1 month ago

[-]

Removing unused inlined functions or false constexpr's is trivial to see. We already have -Winline. We care about removed branches, exprs and stmts due to some optimizer logic.

I'm talking about the optimizer, not the linker, which thanksfully does a lot of pruning.

stinkbeetle

1 month ago

[-]

That's not true for all code bases. Two common examples:

It's very common for inline functions in headers to be written for inlining and constant propagation from arguments result in dead code and better generated code. There is even __builtin_constant_p() to help with such things (e.g., you can use it to have a fast folded inline variant if an argument is constant, or call big out of line library code if variable).

There are also configuration systems that end up with config options in headers that code tests with if (CONFIG_BLAH) {...} that can evaluate to zero in valid builds.

muldvarp

1 month ago

[-]

I'd love for you to write a C compiler that does this and then realize how much dead code there is in your C projects.

rurban

1 month ago

[-]

Yes, I'd love to see the single line being removed, causing security issues. Many others also.

muldvarp

1 month ago

[-]

A C compiler is a relatively simple program (especially if you don't want any optimizations based on undefined behavior). If a large part of the userbase is unhappy with the way most modern C compilers work, they could easily write a "friendly"/"boring" C compiler.

1 month ago

[-]

Some of those already exist, e.g. https://bellard.org/tcc/

However, they're not in widespread use. I would be curious to learn if there's any data/non-anecdotal information as to why. Is it momentum/inertia of GCC/LLVM/MSVC? Are alternative compilers incomplete and can't actually compile a lot of practical programs (belying the "relatively simple program") claim? Or is the performance differential due to optimizations really so significant that ordinary programs like e.g. vim or libjpeg or VLC or whatnot have significant degradations when built on an alternative compiler?

moktonar

1 month ago

[-]

UB is the definition of Free Will that’s why you can’t control it, and for a programmer something that cannot be controlled is felt as dangerous..