I write type-safe generic data structures in C
399 points
1 day ago
| 27 comments
| danielchasehooper.com
| HN
o11c
1 day ago
[-]
For your level 2 code, `uint64_t data[];` is wrong for types whose alignment is greater than `uint64_t`, and also wasteful for types whose alignment is smaller (for example, under an ilp32 ABI on 64-bit architectures).

For your level 3 code, it should be `int main() { List(Foo) foo_list = {NULL};`

Note that working around a lack of `typeof` means you can't return anything. Also, your particular workaround allows `const`ness errors since `==` is symmetrical.

You can't safely omit `payload` since you need it to know the correct size. Consider a `List(int64_t)` and you try to add an `int32_t` to it - this should be fine, but you can't `sizeof` the `int32_t`. Your code is actually lacking quite a bit to make this work.

=====

There are 2 major limitations to generics in C right now:

* Delegating to a vtable (internal or external) is limited in functionality, since structs cannot contain macros, only functions.

* Delegating to an external vtable (mandatory to avoid overhead) means that you have to forward-declare all of the types you'll ever use a vtable with. So far the best approach I've found is to declare (but not define) static functions in the same forwarding header I declare the typedefs in; note that GCC and Clang differ in what phase the "undefined static" warning appears in for the case where you don't actually include that particular type's header in a given TU.

(think about writing a function that accepts either `struct SizedBuffer {void *p; size_t len;};` or `struct BoundedBuffer {void *begin; void *end;};`, and also const versions thereof - all from different headers).

reply
rectang
1 day ago
[-]
> Delegating to an external vtable (mandatory to avoid overhead) means that you have to forward-declare all of the types you'll ever use a vtable with.

We went down the rabbit hole of writing a compiler for this as part of a project I used to work on (Apache Clownfish[1], a subproject of the retired Apache Lucy project). We started off parsing .h files, but eventually it made sense to create our own small header language (.cfh "Clownfish Header" files).

Here's some generated code for invoking the CharBuf version of the "Clone" method defined in parent class "Obj":

    typedef cfish_CharBuf*
    (*CFISH_CharBuf_Clone_t)(cfish_CharBuf* self);

    extern uint32_t CFISH_CharBuf_Clone_OFFSET;

    static inline cfish_CharBuf*
    CFISH_CharBuf_Clone(cfish_CharBuf* self) {
        const CFISH_CharBuf_Clone_t method
            = (CFISH_CharBuf_Clone_t)cfish_obj_method(
                self,
                CFISH_CharBuf_Clone_OFFSET
            );
        return method(self);
    }
Usage:

    cfish_CharBuf *charbuf = cfish_CharBuf_new();
    cfish_CharBuf *clone = CFISH_CharBuf_Clone(charbuf);
We had our reasons for going to these extremes: the point of Clownfish was to provide a least-common-denominator object model for bindings to multiple dynamic languages (similar problem domain to SWIG), and the .cfh files also were used to derive types for the binding languages. But there was truly an absurd amount of boilerplate being generated to get around the issue you identify.

This is why almost everybody just uses casts to void* for the invocant, skipping type safety.

[1] https://github.com/apache/lucy-clownfish

reply
zem
1 day ago
[-]
i am firmly of the opinion that compiling to c is a better route than doing clever c tricks to sort of get what you want. the compiler can be pretty minimal and as you note it pays for itself.
reply
jiggawatts
20 hours ago
[-]
There’s some prior work called CFront. It implements a superset of C that’s just an “increment”. I think it’s worth looking into, it might take off one day!
reply
mingodad
15 hours ago
[-]
Here I've got it to work with recent compilers/OSs https://github.com/mingodad/cfront-3
reply
antonvs
10 hours ago
[-]
I dunno, I looked at that a while back - the version for MS-DOS shipped on multiple floppy disks. Not exactly lightweight!
reply
kccqzy
1 day ago
[-]
> it should be `int main() { List(Foo) foo_list = {NULL};`

In C `int main()` means the function takes an unknown number of arguments. You need `int main(void)` to mean it doesn't take any arguments. This is a fact frequently forgotten by those who write C++.

reply
flohofwoe
1 day ago
[-]
That had been harmonized with C++ in C23 (e.g. func() is equivalent with func(void) now).

It's not really relevant for main() though, even in older C versions main() works fine and simply means "I don't need argc and argv".

reply
el_pollo_diablo
1 day ago
[-]
This is about a function definition, not a random function declarator. C23 does not change anything in that case.
reply
tedunangst
1 day ago
[-]
This is incorrect. In a function definition, an empty list means it takes no parameters. 6.7.5.3 Function declarators

> 14. An empty list in a function declarator that is part of a definition of that function specifies that the function has no parameters.

reply
zombot
16 hours ago
[-]
"has no parameters" is not the same as "cannot take arguments". Defining `int main()` does not stop the runtime from passing the usual 3 arguments (typically named argc, argv, envp), it only means that no parameters are bound to those arguments. Technically it's no problem to have a C function ignore its arguments by not binding parameters. Way too many programmers seem to not understand the difference between parameter and argument.

https://stackoverflow.com/questions/156767/whats-the-differe...

reply
antonvs
6 hours ago
[-]
> "has no parameters" is not the same as "cannot take arguments".

In C I guess that's true. In languages more concerned with compile-time rigor, it often isn't. Not a correction, just an observation.

reply
MangoToupe
13 hours ago
[-]
Surely part of the problem is having a distinct term and handling for parameters passed to functions. What is the point? It seems confusing with no upside.
reply
zombot
13 hours ago
[-]
Do you find the difference between abstract and concrete confusing? Or the difference between container and contents? Is that a pointless distinction with no upside?
reply
MangoToupe
11 hours ago
[-]
I do agree these are useful concepts to distinguish, but I don't get the connection to the topic at-hand. To me, there is just the function signature. I don't see a benefit to referring to passed values as distinct from received values. To my ear "argument" and "parameter" are perfect synonyms.
reply
antonvs
9 hours ago
[-]
> referring to passed values as distinct from received values.

That’s not the distinction being made by those terms.

“Parameter” refers to a named variable in a function definition.

“Argument” refers to an actual value that’s passed to a function when it’s called.

It’s exactly the same as the distinction between variables and values (which you probably see the use for), just applied to the special cases of function signatures and function calls.

(As an aside, in the lambda calculus this relationship becomes a perfect equivalence: all variables are parameters and all values are arguments.)

reply
MangoToupe
8 hours ago
[-]
Well, I certainly would not interpret the terms that way, but you do you.
reply
antonvs
7 hours ago
[-]
It's the standard definition in a software development context, which you can find all over the place. Here are some examples:

> "Parameters are named variables declared as part of a function. They are used to reference the arguments passed into the function."

-- MDN, https://developer.mozilla.org/en-US/docs/Glossary/Parameter

> "A parameter is a special kind of variable used in a function to refer to one of the pieces of data provided as input to the function. These pieces of data are the values of the arguments with which the function is going to be called/invoked."

-- Programming Fundamentals, https://press.rebus.community/programmingfundamentals/chapte...

> "Parameters refer to the variables listed in a function's declaration, defining the input that the function can accept. Arguments, however, are the actual values passed to the function when it is called, filling the parameters during execution."

-- https://www.geeksforgeeks.org/computer-science-fundamentals/...

While you might be tempted to "do you" and use your own idiosyncratic definitions, I advise against it, since it makes it difficult for you to understand what others are saying, and vice versa.

reply
rangerelf
7 hours ago
[-]
lol, it's not a "you do you" thing, that's what they're actually named, "parameters" and "arguments" have distinct objective definitions in this context and those are it. In this specific case it's you who's using made up words for concepts that others already have a specific name for.
reply
MangoToupe
5 hours ago
[-]
> that's what they're actually named

...by what authority? c'mon, communication is important, and insisting on the correctness of definitions tanks that.

EDIT: however, I will concede there's good evidence for widespread usage of this, and I'll adjust my usage accordingly. Insisting on "correctness" is just asinine, though.

reply
s3graham
1 day ago
[-]
As you surely know if you're quoting the standard, it depends on which standard!
reply
tedunangst
1 day ago
[-]
Quote a different standard.
reply
gpderetta
1 day ago
[-]
I believe that since C23 foo() is now a nullary function. As this is the last approved standard and it supersedes all previous standards, it is technically correct to say that de-jure this is what the (unqualified) C standard mandates.

Of course de-facto things are more nunanced.

reply
el_pollo_diablo
1 day ago
[-]
C23 does not change anything in this situation, because we are talking about the definition of main(), not a forward declaration. More details here:

https://news.ycombinator.com/item?id=38729278#38732366

reply
fuhsnn
18 hours ago
[-]
In what situation fn() doesn't mean fn(void) under C23?
reply
el_pollo_diablo
18 hours ago
[-]
None, but that is not my point. Before C23, fn() already meant the same thing as fn(void) in function definitions, which the situation under discussion here.

C23 changed what fn() means outside a function definition.

reply
fuhsnn
17 hours ago
[-]
Oh, yeah, the codegen for the fn() itself would likely be the same, but the prototype of that definition is still a K&R function. https://godbolt.org/z/Psvae55Pr
reply
EPWN3D
1 day ago
[-]
I would love for `union`s to be federated, that is, a type could declare itself as thought it was part of a union with another type, without having to pre-declare all possible types in one place.
reply
Joker_vD
15 hours ago
[-]
Can't you just declare anonymous unions whenever? E.g.

    struct foo { ... };
    
    struct bar { ... };

    struct bar check_this_out(struct foo foo) {
        return ((union { struct foo foo; struct bar bar; }){ .foo = foo}).bar;
    }

    struct bar *also_this(struct foo *foo) {
        union { struct foo foo; struct bar bar; } *tmp = (void*)foo;
        return &tmp->bar;
    }
reply
o11c
1 day ago
[-]
For layout-compatible types, you can often just include a `_base` member in each child. Maybe twice (once named and once unnamed) to avoid excess typing - I don't understand the common-initial-subsequence rule but people do this enough that compilers have to allow it.
reply
manwe150
8 hours ago
[-]
I don't think compilers typically allow that: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14319
reply
n_plus_1_acc
1 day ago
[-]
This is also problematic, because there might be padding and the calculated size might be too small:

`malloc(sizeof(*node) + data_size);`

reply
o11c
1 day ago
[-]
There's no a problem with the author's current code, since the padding is already included in the node size, but it would be a problem after doing alignment more intelligently.
reply
gritzko
1 day ago
[-]
Hi. I object.

The trick#0 you mention is how I made an entire C dialect. Here is a generic binary heap, for example https://github.com/gritzko/librdx/blob/master/abc/HEAPx.h The syntax is a bit heavyweight, but a huge huge advantage is: you get regular C structs in the end, very plain, very predictable, very optimizable. Compiler would eat them like donuts.

In the other cases, it is void* and runtime memory sizing and you have to define macros anyway.

reply
dhooper
1 day ago
[-]
Author here. Binary heaps and linked lists are different use cases. A binary heap must read the data you put in it to store it correctly, but a linked list doesn't. If I were writing a generic binary heap, maybe I would weigh my options differently. I mentioned this in the footnotes.
reply
wordglyph
1 day ago
[-]
And that's why I like C++ templates
reply
variadix
1 day ago
[-]
I agree, there are actually several reasons to prefer the header impl. Debugging is better, both because you can step through the header code where you can’t with a macro function, and because the type information available to the debugger is better. There are more opportunities for compiler optimizations because each instantiation is monomorphized and you don’t pay a runtime cost with variable sizing, generic structures can also be placed on the stack because of the fixed sizing.

There are workarounds for at least two of the problems the author mentions. Naming can be changed from Bar_func(args…) to func(Bar)(args…) with a function name macro that just does name mangling. You can avoid some of the binary bloat by using weak symbols, letting the linker deduplicate functions shared between translation units at link time.

There are other problems for generic containers of pointer types however, you can work around them by using a typedef or a type alias.

Intrusive data structures are more convenient in C still, but working with them in a debugger is a pain.

reply
dhooper
1 day ago
[-]
Author here. It's worth noting that no work is being done in the macros of my article, they compile down to a normal c function call which you can step through in a debugger.

There is little benefit in monomorphizing the implementation of a data structure like a linked list where its behavior doesn't depend on the contents of the data it contains (compared to, say, a max heap)

reply
knutwannheden
1 day ago
[-]
> Compiler would eat them like donuts.

Made me laugh out loud!

reply
layer8
1 day ago
[-]
The casting of the function type assumes that the item pointer type (e.g. Foo*) has the same representation as void*, which the C standard doesn’t guarantee (in standardese: the two types aren’t “compatible”). Calling the function with the converted type therefore constitutes undefined behavior. It also impacts aliasing analysis by compilers (see [0], incidentally), even if the pointer representation happens to be the same.

This casting of the functions to different argument types constitutes the core of the type safety of the generic invocations; I’m not sure it can be fixed.

[0] https://news.ycombinator.com/item?id=44421185

reply
dhooper
1 day ago
[-]
This is addressed in the footnotes. casting is not the core of the type safety. Read the whole article.
reply
layer8
1 day ago
[-]
Ah, that’s what I get for not reading the footnotes. However, the alternative solution presented evaluates the item argument twice, which is problematic as well (but could probably be worked around by passing `(list)->payload` on instead). Secondly, the assignment for type-checking doesn’t work for read-only operations on a const List, or does it? And doesn’t the assignment overwrite the head? Lastly, the do-while construction means you can’t use it for operations that return a value (without compiler extensions).

I also don’t agree it’s “squeamish” to be wary of aliasing analysis going wrong. It’s not a clean abstraction and can hide subtle bugs down the road.

reply
el_pollo_diablo
1 day ago
[-]
Sure, but your alternative code incorrectly assigns to (list)->payload. You have many other options. Without typeof, you can if(0) the assignment, or check type compatibility with a ternary operator like 1 ? (item) : (list)->payload and pass that to _list_prepend, etc. With typeof, you can store item in a temporary variable with the same type as (list)->payload, or build a compound literal (typeof(*(list))){.payload=(item)}, etc.
reply
dhooper
1 day ago
[-]
The assignment is intentional. The union changed to a struct
reply
Joker_vD
15 hours ago
[-]
Working with function pointers is always finicky. I once had MSVC fold together

    int f0(int x) { return x; }

    int f1(int x, int y) { return y; }
into a single function, so at runtime, (void(*)())f0 and (void(*)())f1 would compare equal. AFAIK, you are not guaranteed by the standard that functions of different signatures would, when their addresses are taken, result in different pointers, so it's not technically a bug... but it's quite surprising, and ruins certain tricks.
reply
chuckadams
9 hours ago
[-]
I believe "type witness" is the generic (hah) term for "member that doesn't do anything but hold the type". Lot less literature out there about type witnesses than I had thought though...
reply
gizmo686
9 hours ago
[-]
There's a similar term "phantom type", for when you have some sort of type variable that is never used as the type of an actual variable.

I've mostly seen in done in Haskell, but have used it in it Scala as well to simulate type hierarchies not present in the actual type system.

In a way, this union trick is kind of like a phantom type, since the secondary type is never actually used.

reply
chuckadams
9 hours ago
[-]
Yah, I tend to think of type witnesses as actually existing at runtime and phantom types not, but in the union trick, they don't really exist either. So thinking on it some more, seems more of a way of expressing type parameters in the first place, and well, that's what the article was about.

Now I'm wondering what phantom types would look like in C...

8-/

reply
cherryteastain
1 day ago
[-]
Why would you jump through all these hoops instead of just writing C++ if you want "C with generics"
reply
_proofs
1 day ago
[-]
because i work on a legacy project that is coupled to safety regulations and other quality guarantees, and we cannot just simply roll out a solution ported to c++ on the next release, or even tenth, so perhaps we make it work until we can.

however we can set a standard and expectation for new projects to use c++, and we do and set an expectation to target a specific std.

i see this sentiment quite a lot on hackernews -- feels like a lot of people saying "git gud" -- i would expect a lot more nuance applied here.

reply
gnulinux996
59 minutes ago
[-]
Out of curiosity, what is an example of a regulation that specifies using a specific programming language and/or toolchain out there?
reply
pjmlp
12 hours ago
[-]
While using old compilers in certified domains is a common problem, unless we are talking about stuff like PIC, the choice of reaching out to C++ compilers is a given, even if what they might support is between C++98 and C++11.
reply
Kranar
1 day ago
[-]
Because for many of the use cases where C is used, switching to C++ involves jumping through even more hoops.
reply
lionkor
1 day ago
[-]
Do you have a couple of real world examples?
reply
mikepurvis
1 day ago
[-]
Any established C codebase, for example the kernel or Postgres?

Traditionally microcontroller firmwares as well, though those are increasingly friendly to C++, you just have to be careful about allocations as C++ makes it way easier to accidentally allocate than C does.

reply
TuxSH
12 hours ago
[-]
> Any established C codebase, for example the kernel or Postgres?

Obviously you mean the Linux kernel, specifically. Also C++ has gotten a lot better since 2003 and before.

Examples of C++ usage in commercial codebases:

- anything Nintendo has written since 2010 (including microkernel, entire OS and all, for 3DS and Switch 1/2)

- well-known, high performing databases like ScyllaDB

> you just have to be careful about allocations as C++ makes it way easier to accidentally allocate than C does.

With the exception of exceptions (and coroutines but that's easily customizable), it's merely a standard library thing and doesn't affect language features (incl. language features exposed as std:: funcs/traits)

C++ has got a bad rep because it never fixes broken stdlib parts due to API and ABI concerns, and has many footguns due to this, that might make Rust and C better choices in corporate environments. However, IMHO, C++ is a much better language than C for personal projects, or when having a team of experienced folks.

reply
gpderetta
12 hours ago
[-]
> Any established C codebase

Anecdotally, GCC and GDB successfully (and incrementally) switched to C++ form C in the recent past.

The Linux kernel will never do it for ideological reasons of course.

I don't know about Postgres.

reply
cryptonector
7 hours ago
[-]
> > Any established C codebase, for example the kernel or Postgres?

> Obviously you mean the Linux kernel, specifically.

Or any BSD, or Illumos, or Solaris, or any Unix-derived kernel, or... Even the Windows kernel is in C (or a very cut-down C++).

reply
charcircuit
19 hours ago
[-]
Nothing is stopping you from linking C++ code to Postgres.
reply
cryptonector
7 hours ago
[-]
But if you want to _contribute_ to PostgreSQL, it has to be in C.
reply
neonz80
1 day ago
[-]
I'm not sure about other compilers, but compiling C code as C++ with MSVC ends up with pretty much the exact same code, instruction by instruction. C++ is a bit more strict though especially with casting, so a lot of code won't compile out of the box.
reply
vbezhenar
1 day ago
[-]
C++ code compiles to a different function names in object file (name mangling). You probably need to put a lot of ugly `#ifdef __cplusplus extern "C" {` boilerplate in your headers, otherwise C and C++ files will not compile together.
reply
winocm
1 day ago
[-]
Don't forget the infamous pattern used in some C projects too:

  struct foo decl = {
    .member = /* ... */
    .next = &(struct nested_pointer) {
        .nested_member = /* ... */,
    },
    .array = (struct nested_array[]) {
      [0] = { /* ... */ },
    }
  };
This pattern does not work in C++ as the nested declarations become temporaries.
reply
_proofs
1 day ago
[-]
literally a good majority of existing embedded software coupled to applications in safety -- devices used by fire safety and first responders.
reply
rectang
1 day ago
[-]
Writing extensions for projects that support C extensions but may not support C++ extensions, e.g. many dynamic languages.
reply
Snarwin
1 day ago
[-]
You can still write the extension in C++ and expose an extern "C" interface.
reply
rectang
1 day ago
[-]
That's possible, but then the people building your extension need a C++ toolchain.

The question was "please provide examples where switching to C++ involves jumping through even more hoops", and in my view requiring downstream to use a C++ environment when they're expecting to use a C environment qualifies.

reply
uecker
1 day ago
[-]
True. For me, C++ itself is the maze of hoops I would rather want to avoid.
reply
pjmlp
12 hours ago
[-]
Most C compilers are C++ toolchain as well, we are no longer in the 1990's.

Unless of course the project is using such a old compiler.

reply
rectang
10 hours ago
[-]
Problems in this domain arise more because you’re wandering off the beaten path for configuration and tooling than because the systems lack absolute capabilities. If you don’t want to build your extension the way that the extension framework expects you, all sorts of annoyances show up. Maintaining an cross-platform compatible C extension is hard enough already.
reply
adastra22
1 day ago
[-]
Embedded systems, for example.
reply
teraflop
1 day ago
[-]
I know it used to be, but is it really still common for embedded systems to use weird architectures that G++/Clang don't support?
reply
adastra22
1 day ago
[-]
Unless it is a popular system or common architecture, yes.
reply
jenadine
18 hours ago
[-]
Could you show me an example of a micro controller still supported today which doesn't have a C++ compiler?
reply
foldr
13 hours ago
[-]
The 8051. I think C++ compilers technically exist for it, but for most hardware the only practical choice is the Keil C51 compiler, which is C89.
reply
jjmarr
17 hours ago
[-]
> is it common for weird architectures to exist?

> yes, unless you're using a common one.

reply
adastra22
10 hours ago
[-]
That isn’t a contradiction.
reply
pjmlp
12 hours ago
[-]
Only very weird ones that barely support C like PIC, most ones support at least C++98 or C++11.
reply
pjmlp
12 hours ago
[-]
Because some people really hate C++ to their bones, hence this kind of stuff that keeps coming up.

I was really disappointed that Microsoft decided to backtrack on C++'s is the future, after the new found Linux and FOSS love.

https://herbsutter.com/2012/05/03/reader-qa-what-about-vc-an...

https://devblogs.microsoft.com/cppblog/c11-and-c17-standard-...

Not that it matters much, as nowadays C and C++ have a new policy at Microsoft due to goverments and cyberlaws.

https://azure.microsoft.com/en-us/blog/microsoft-azure-secur...

https://blogs.windows.com/windowsexperience/2024/11/19/windo...

reply
petabyt
18 hours ago
[-]
The real answer: it's more fun this way.
reply
brunker2
1 day ago
[-]
Why would you write C++ if you can get the same result by jumping through a few hoops with C?
reply
zabzonk
1 day ago
[-]
Templates in C++ require language support - you can't simply implement them with "a few hoops" in C.
reply
ramon156
17 hours ago
[-]
Templates are a solution for a problem that c++ themselves created
reply
flqn
16 hours ago
[-]
You mean the same problem that the article is trying to solve in C? Generic data structures?
reply
menaerus
16 hours ago
[-]
What problem?
reply
germandiago
13 hours ago
[-]
If I have to do that I'd rather use C++ templates directly.
reply
teo_zero
19 hours ago
[-]
> Structurally identical types will be considered the same type in GCC 15 and Clang later in 2025 thanks to a rule change

Beware that only tagged unions are considered the same type under the new rule, provided they have the same structrure and the same tag.

The List(T) macro should be changed to generate a different tag for each different T. Which is trivial (with ##) for simple one-word types, but impossible for even mildly complex ones like pointers to char (strings).

Of course you can force yourself to typedef any type before using it in a List, but it looses much of its versatility. Example:

  typedef char *str;
  List(str) my_list_of_str;
  List(str) tokenize(str input) {...}
reply
alcover
13 hours ago
[-]
> Beware that only tagged unions are considered the same type

I don't get it. Tagged union is just a design pattern.

reply
wahern
10 hours ago
[-]
By tagged they mean have an identifier. Compare

  > struct { ... } foo;
and

  > struct bar { ... } foo;
The latter has an identifier, bar; the former doesn't. The standard uses tag to refer to the identifier name, if any, in an enum, struct, or union declaration.
reply
teo_zero
9 hours ago
[-]
Exactly, thank you.

I've always called "tag" the id that optionally follows struct/union/enum. Is it the wrong word? Some specs call it "name", but "unnamed union" sounds dangerously similar to "anonymous union", which is a different concept, namely (no pun!) an unnamed member of an outer struct or union whose submembers can be accessed as if they belong in the outer one. E.g.

  struct {
    struct {
      int m;
    }; // no name: anonymous
  struct s;
  s.m = 1;
reply
uecker
1 day ago
[-]
It is cool trick. I already use in my experimental library though ;-) https://github.com/uecker/noplate/blob/main/src/list.h
reply
eqvinox
1 day ago
[-]
I guess if anyone might know it might be you—do you see any way of doing this for intrusive data structures, embedding the node struct in the data (and as side effect supporting an object to be on multiple containers) rather than the data in the node like you're doing there?
reply
uecker
1 day ago
[-]
You could put the dummy member into the embedded node. But for intrusive data structures you often want them to erase the type so that you write generic algorithms as regular functions. In this case, it makes more sense to have a run-time check do to down casts. I do this with my variadic type which has an intrusive "super" member: https://godbolt.org/z/ofdKe7Pfv The overhead is often completely removed by the compiler.
reply
eqvinox
1 day ago
[-]
Mhm. Putting the dummy member into the embedded node doesn't give a path from the proper object to find the embedded node "mid-struct". run-time checks are the "easy way out". We/I'll stick to macro soup probably, so we have compile-time checks.

btw. For ISO C WG14… has anyone suggested adding _Include to the preprocessor, along the lines of _Pragma? It'd really help with doing this kind of really long macros, hiding the clunky "#define, #define, #define, #include" inside a macro…

reply
uecker
1 day ago
[-]
I don't think anybody has proposed this, at least not recently. There is a proposal for multi-line macros: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3531.txt
reply
eqvinox
1 day ago
[-]
That one might actually be better, nice to know & thanks for the pointer! (I hope it actually goes somewhere…)
reply
jayde2767
4 hours ago
[-]
This is all well and good but, for my mileage, nothing can ever beat C++ generics and the RAII pattern when there is a choice between plain old C and C++ . One second while I get my fire-retardant suit on...
reply
makz
10 hours ago
[-]
Interesting but too complicated for me.

For my own hashmap implementation I followed a wasteful aproach since I’m. It targeting embedded.

I created a structure called a hashmap object. It has two elements: a void pointer and a char pointer. The first one is the data and the second one is the metadata. The metadata is basically a string were the user can put anything, the type of the data, more data, whatever.

Then I preallocate 10s of thousands of hashmap objects. That way users of my hashmap don’t have to think about aollocating and de allocating hashmap nodes, they just insert, delete and search freely. They still have to care about allocating and de allocating they’re own data though.

reply
mfuzzey
1 day ago
[-]
There's also the method used in the Linux kernel to embed the list information (struct list_head) within the type specific struct. https://kernelnewbies.org/FAQ/LinkedLists
reply
nixpulvis
1 day ago
[-]
The naming of LIST_HEAD_INIT and INIT_LIST_HEAD is confusing to me.
reply
mfuzzey
1 day ago
[-]
The way I remember it is:

INIT_LIST_HEAD is of form VERB_NOUN so is called from within a function to programatically initialise the list.

LIST_HEAD_INIT is NOUN_VERB and is used within a structure initialiser not from a function.

But my main point was to show the "embed the list in the data" approach rather than "embed the data in the list" or "point to the data from the list" and not to discuss the naming details in the kernel implementation of the concept.

reply
el_pollo_diablo
15 hours ago
[-]
Not to mention that they insist on calling every entry of the list a "list head", which makes no sense (hysterical raisins, maybe?). The structure is made of a uniform loop of entries, one of which is used as the actual head & tail, or entry point into the structure.
reply
antonvs
9 hours ago
[-]
In general, there is no “actual” head and tail - you could have multiple references to different parts of the list, and each of them would have a different head. If you’re recursing through a list, at some point every node will be used as the head. This is a common pattern in recursive data structures, particularly in functional languages.

Disclaimer: I haven’t looked at this author’s code, just pointing out that list nodes that consist of (head, tail) are a common pattern with a clear rationale.

reply
RustyRussell
14 hours ago
[-]
Yes, it's terrible, and the fact that their list_add takes parameters backwards from what one might expect, with no types to catch mistakes!

See https://github.com/rustyrussell/ccan/blob/master/ccan/list/_...

reply
el_pollo_diablo
14 hours ago
[-]
Absolutely. Wrapping the distinguished entry point in a new structure type equipped with a thin type-safe wrapper API that covers the most common use case is the way to go.
reply
eqvinox
1 day ago
[-]
Interesting idea with the union and using typeof(). We (I) went with large macros defining wrappers instead, which, I believe, is a necessity with intrusive data structures, or at least I don't immediately see how to do that with unions & typeof. Maybe it's possible...?

e.g. hash table wrapper: https://github.com/FRRouting/frr/blob/master/lib/typesafe.h#...

(cf. https://docs.frrouting.org/projects/dev-guide/en/latest/list...)

reply
HexDecOctBin
1 day ago
[-]
The key idea here seems to be to use function pointer's type to enforce type safety rather than using the data "handle" type (that is often found in implementations inspired by Sean Barrett's strechy_buffers).

> One annoying thing about C is that it does not consider these two variables to have the same type

C23 solves that too: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3037.pdf

Supported by latest GCC and Clang, but not by MSVC.

reply
dhooper
1 day ago
[-]
Author here. Not quite. The key idea is about using a union to associate type information with a generic data type. Type casting a function is not the only way to use that type information. I discuss that as well as the C23 changes in the footnotes and the "typeof on old compilers" section.
reply
wahern
1 day ago
[-]
FWIW, as far back as 2015 my feature check library documents Visual Studio as supporting "__typeof".[1] Note the leading but not trailing underscores. Perhaps I was mistaken, but I usually tested that sort of thing. It's also possible __typeof had slightly different semantics.

[1] See https://github.com/wahern/autoguess/blob/b44556e4/config.h.g... (that's the 2015 revision, but HEAD has the same code).

reply
dhooper
1 day ago
[-]
msvc 19.39 is the first to support it, which I mention in the article. You can confirm it didn't work up through 19.38 in godbolt [1]. I don't use Visual Studio, so I don't know what version of that first started using msvc 19.39

[1] https://godbolt.org/z/M7zPYdssP

reply
wahern
22 hours ago
[-]
This is gonna haunt me.

Digging through some old code of mine (circa 2009) I found this bit:

  #elif _MSC_VER >= 1310
  #define typeof(type)    __typeof(type)
So somehow I had the impression Visual Studio .NET 2003 (7.1)[1] added __typeof. I'm still holding out hope someone will come to my rescue and reply that once upon a time MSVC had __typeof, but removed it. But for now it seems past me is gaslighting present me.

[1] See https://learn.microsoft.com/en-us/cpp/overview/compiler-vers... for mapping between Visual Studio versions and _MSC_VER.

EDIT: Ah ha! It seems Microsoft did support __typeof, but perhaps only for "managed" C++ (aka C++ .NET)?

> One thing to watch out for when using the __typeof operator in managed C++ is that __typeof(wchar_t) can return different values depending on the compilation options.

Source: https://learn.microsoft.com/en-us/archive/msdn-magazine/2002... (See also https://learn.microsoft.com/en-us/archive/msdn-magazine/2005...)

reply
lakjwerlkawjrl
8 hours ago
[-]
"I write type-safe generic data structures in C"

sounds like

"I build life-size houses out of toothpicks"

I can do that to, but I have no desire to. It's so much easier to just switch to Rust.

reply
WalterBright
1 day ago
[-]
Here's how to do it in D:

    struct ListNode(T) {
        ListNode* next;
        T data;
    }

    T!int node;
Why suffer the C preprocessor? Using preprocessor macros is like using a hammer for finish carpentry, rather than a nail gun. A nail gun is 10x faster, drives the nail perfectly every time, and no half moon dents in your work.
reply
dhooper
1 day ago
[-]
Thanks, this post is about C.

On some projects you must use C.

reply
WalterBright
1 day ago
[-]
If I may may be provocative :-) this post isn't about C. It's about layering on a custom language using C preprocessor macros.

My compilers were originally written in C. I started using the C preprocessor to do metaprogramming. After some years I got fed up with it and removed nearly all of the preprocessor use, and never looked back. My code was much easier to understand.

An amusing story: long ago, a friend of mine working for Microsoft was told by a team leader that a 50K program had a bug in it, and sadly the developer was long gone. He'd assigned programmer after programmer to it, who could not fix it. My friend said he'd give it a try, and had it fixed in 2 hours.

The source code was written in Microsoft MASM, where the M stood for "Macro". You can guess where this is going. The developer had invented his own high level language using the macro system (which was much more powerful than C's). Unfortunately, he neglected to document it, and the other programmers spent weeks studying it and could not figure it out.

The leader, astonished, asked him how he figured it out in 2 hours? My friend said simple. He assembled it to object code, then disassembled the object code with obj2asm (a disassembler I wrote that converts object code back to source code). He then immediately found and fixed the bug, and checked in the "new" source code which was the disassembled version.

I've seen many very smart and clever uses of the C macros, the article is one of them. But isn't it time to move on?

reply
uecker
18 hours ago
[-]
I could tell a similar story (many, in fact) about C++'s templates. It is not entirely clear to me what exactly makes the preprocessor a bad choice. One could argue that it is too flexible, so it is possible to create a mess with it. But somehow this seems a rather weak argument for inventing another monomorphization layer, which often evolve into their own mess.
reply
WalterBright
8 hours ago
[-]
You can definitely make an incomprehensible soup out of C++ templates. C++ expression templates are a classic example. But the threshold of this is much higher than with a macro language.

Using C++ templates for a linked list type doesn't make a mess.

reply
uecker
2 hours ago
[-]
I am not sure, I quite like the new C macro-templates. I also do not think the implementation is messy. Can you narrow down specific aspects where you think using macros for this is problematic?
reply
ryao
1 day ago
[-]
If the C compiler accepts it, it is C.
reply
WalterBright
1 day ago
[-]
Pedantically, the preprocessor is an entirely separate language. The lexing, parsing, expressions, and semantics are totally distinct. The preprocessor is usually implemented as a completely independent program. My first C compiler did integrate the preprocessor with the C compiler, but that was for performance reasons.

Currently, ImportC runs cpp and then lexes/parses the resulting C code for use in D.

reply
ryao
23 hours ago
[-]
It is part of the C standard. Whether it is part of a separate binary is an implementation choice.
reply
WalterBright
23 hours ago
[-]
True on both counts. But they are still separate and distinct languages.
reply
ryao
12 hours ago
[-]
C is a composition of what you describe as separate languages. They are both parts of C. That is why we call unpreprocessed code C code.
reply
zabzonk
1 day ago
[-]
There is no one "the C compiler".
reply
ryao
23 hours ago
[-]
Pragmatically, the only C compiler that matters for what is or is not C is the one you are using.
reply
zabzonk
23 hours ago
[-]
Only if you are lucky enough to only use one compiler, or only one version of the same one.
reply
dfawcus
7 hours ago
[-]
Nah - one would simply use a punch with the hammer.

So nail the architrave with the hammer until the nail is say 1/8" proud, then punch it home.

reply
WalterBright
6 hours ago
[-]
There's a reason carpenters use nail guns. Try it and you'll see!
reply
dfawcus
3 hours ago
[-]
Well, obviously speed when one is on the clock.

All power tools have trade offs vs the manual alternates, often tipping towards their use, but not always.

I have had success on an occasion using a belt fed screw gun when fitting plaster boards, especially for ceiling boards.

However I don't do joinery often enough to justify the use of a nail gun.

reply
tehnub
1 day ago
[-]
When I saw the title I assumed it was originally "Why I" or "How I" and was trimmed automatically by HN, but this is the original. Could it be that the author was influenced by HN's title guidelines and titled it thus?
reply
david2ndaccount
1 day ago
[-]
The “typeof on old compilers” section contains the code:

         (list)->payload = (item); /* just for type checking */\
That is not a no-op. That is overwriting the list head with your (item). Did you mean to wrap it in an `if(0)`?
reply
josephg
1 day ago
[-]
In that example they also had replace the union with a struct - presumably to work around this issue. But that seems wasteful to me too. Doing it within an if(0) seems strictly better.
reply
hgs3
1 day ago
[-]
I'm curious what a hashmap looks like with this approach. It's one thing to pass through or hold onto a generic value, but another to perform operations on it. Think computing the hash value or comparing equality of generic keys in a generic hashmap.
reply
lhearachel
1 day ago
[-]
I first would question what a user wants to do with a hashmap that uses polymorphic key-values of unknowable type at compile-time.

As a thought experiment, you could certainly have users define their own hash and equality functions and attach them to the table-entries themselves. On first thought, that sounds like it would be rife with memory safety issues.

At the end of the day, it is all just bytes. You could simply say that you will only key based on raw memory sequences.

reply
KerrAvon
6 hours ago
[-]
> I first would question what a user wants to do with a hashmap that uses polymorphic key-values of unknowable type at compile-time.

All of Apple's modern platforms use this concept pervasively, because the Objective-C Foundation framework has a common primitive data structure for it (NSDictionary).

reply
asplake
1 day ago
[-]
Interesting! I’m working on toy/educational generator of ML-style tagged variants and associated functions in C (for a compiler) and when I’m a bit further along I will see if they’re compatible.
reply
ryao
1 day ago
[-]
uint64_t data[] in level 2 violates the strict aliasing rule. Use the char type instead to avoid the violation.
reply
ape4
1 day ago
[-]
Or write in CFront and have it translated to C
reply
zabzonk
1 day ago
[-]
And where are you going to get a cfront compiler these days?
reply
mingodad
15 hours ago
[-]
reply
pjmlp
17 hours ago
[-]
Welcome to around 1992, when Borland published BIDS 1.0 using similar tricks, as C++ templates were yet to be made part of the language.

With similar articles in The C/C++ Users Journal and Dr Dobbs, throught their existence.

While the effort is commendable, maybe pick a safer language that actually has all the features for type safe generic code.

reply
SuperV1234
13 hours ago
[-]
Level 4: switching to C++
reply
JacksonAllan
1 day ago
[-]
I think the idea of using a union to store the element type without any extra run-time memory cost might have some use, specifically in cases where the container struct wouldn't typically store a variable of the element type (or, more likely, a pointer to the element's type) but we want to slip that type information into the struct anyway.

However, the problem that I have with this idea as a general solution for generics is that it doesn't seem to solve any of the problems posed by the most similar alternative: just having a macro that defines a struct. The example shown in the article:

    #define List(type) union { \
        ListNode *head; \
        type *payload; \
    }
could just as easily be:

    #define List(type) struct { \
        type *head; \
        /* Other data, such as node/element count... */ \
    }
(As long as our nodes are maximally aligned - which they will be if they're dynamically allocated - it doesn't matter whether the pointer we store to the list head is ListNode *, type *, void *, or any other regular pointer type.)

The union approach has the same drawback as the struct approach: untagged unions are not compatible with each other, so we have to typedef the container in advance in order to pass in and out of functions (as noted in the article). This is broadly similar to the drawback from which the "generic headers" approach (which I usually call the "pseudo-template" approach) suffers, namely the need for boilerplate from the user. However, the generic-headers/pseudo-template approach is guaranteed to generate the most optimized code thanks to function specialization[1], and it can be combined with another technique to provide a non-type-prefixed API, as I discuss here[2] and demonstrate in practice here[3].

I'd also like to point to my own approach to generics[4] that is similar to the one described here in that it hides extra type information in the container handle's type - information that is later extracted by the API macros and passed into the relevant functions. My approach is different in that rather than exploiting unions, it exploits functions pointers' ability to hold multiple types (i.e. the return type and argument types) in one pointer. Because function pointers are "normal" C types, this approach doesn't suffer from the aforementioned typedef/boilerplate problem (and it allows for API macros that are agnostic to both element type/s and container type). However, the cost is that the code inside the library becomes rather complex, so I usually recommend the generic-headers/pseudo-template approach as the one that most people ought to take when implementing their own generic containers.

[1] https://gist.github.com/attractivechaos/6815764c213f38802227...

[2] https://github.com/JacksonAllan/CC/blob/main/articles/Better...

[3] https://github.com/JacksonAllan/Verstable

[4] https://github.com/JacksonAllan/CC

reply
notnmeyer
1 day ago
[-]
pretty sure C is the new Go.
reply
qustrolabe
1 day ago
[-]
pretty sure C has to go
reply
revskill
1 day ago
[-]
Without the concurreny part.
reply
oflebbe
1 day ago
[-]
OpenMP to the rescue
reply
sltkr
1 day ago
[-]
Or garbage collection. Or interfaces. Or packages. Or actual generics.
reply
adamnemecek
22 hours ago
[-]
Dude the ship has sailed.
reply
dhooper
21 hours ago
[-]
Not sure what you mean by that, but if you're trying to imply that C is not relevant: https://www.tiobe.com/tiobe-index/

Plus an article about C was at the top of hacker news all day today.

reply
tialaramex
12 hours ago
[-]
TIOBE is bullshit. When somebody cites TIOBE what they mean is either "I have no idea what I'm talking about, but here's something authoritative seeming that agrees with me" or worse "I know this is bullshit and I don't care"

C is still relevant, but TIOBE numbers are not.

reply
adamnemecek
8 hours ago
[-]
Not C per se, but trying to make C type safe.
reply
monkeyelite
1 day ago
[-]
Another way is to not try to write generic data structures. When you tailor them to the use case you can simplify.

The #1 data structure in any program is array.

reply
gpderetta
12 hours ago
[-]
The irony is that arrays in C are in fact generic. As that is the simplest generic solution that doesn't require boiling the ocean first, that's what many programmers reach for.
reply
dwattttt
1 day ago
[-]
When all you have are arrays, everything looks like a problem you solve with arrays.

There are quite a few problems that specialised containers are suited for, that's why they were created.

reply
monkeyelite
23 hours ago
[-]
And you can write them when you need them.

The situation where you need a red black tree with 10 different key/value combos isn’t real.

reply
el_pollo_diablo
18 hours ago
[-]
If, by "situation", you mean the development of a small program with so many constraints that using existing libraries is out if the question, then yes.

Otherwise, that seems unwise to me. Not every user of a generic type has to be generic. A major selling point of generic types is that you write a library once, then everyone can instantiate it. Even if that is the only instance they need in their use case, you have saved them the trouble of reinventing the wheel.

No colleague of mine may need 10 different instances of any of my generic libraries, but I bet that all of them combined do, and that our bosses are happy that we don't have to debug and maintain 10+ different implementations.

reply
monkeyelite
5 hours ago
[-]
Ok you’re telling me the upside which we already know. Now what’s the downside?
reply
gpderetta
12 hours ago
[-]
If you can only write programs by banging rocks together, you will only produce programs that can be written by banging rocks together.
reply
monkeyelite
20 minutes ago
[-]
And if you're used to writing programs with Dictionary<String, Dictionary<String, Array<String>>> you will do that too.
reply
dwattttt
21 hours ago
[-]
You could take away anything you use and say "but we could make it ourselves", that doesn't mean it's helpful.
reply
monkeyelite
18 hours ago
[-]
Except it’s very common for C programs to contain one-off data structures, so it’s not a hypothetical. It’s a concrete programming style.
reply
dwattttt
15 hours ago
[-]
Do you mean a data structure they only use once? Or one that's never been done elsewhere? If they only use it once, that seems like the worst effort/pay-off ratio you can get writing it yourself. And I don't think there's that many fundamental data structures out there... and even then, why would it be good to be forced to make your bespoke structure out of only arrays, when things like maps exist?
reply
monkeyelite
5 hours ago
[-]
> And I don't think there's that many fundamental data structures out there

No. There are a few fundamental ones which work well in a generic context. When you tailor make it you find simplifying assumptions.

One I am aware of is a hash map that doesn’t need to delete individual keys.

reply
cb321
4 hours ago
[-]
This is true of the common tombstone approach to deletions in hash tables { which also require rehashing (like in resizing) if there are too many tombstones }.

Somehow dissemination of Knuth v3,chapter6.4 Algorithm D has been weak, though it has lately become known as "backshift deletion" - unsure who coined that. It is limited to linear probing (but then these days that also creates the least memory traffic/potential latency). With this approach, there is no real specialized form of "hash tables that can delete". You may already know all of this and not disagreeing with anything. Just a natural follow-up.

reply
monkeyelite
21 minutes ago
[-]
If we look at the major standard libraries for hash tables, you're telling me there will be no overhead to supporting the ability to delete keys?

Isn't resizing logic already a counterexample?

reply
el_pollo_diablo
18 hours ago
[-]
Sure, but it is also very common for C programs to contain data structures that have one use in the program, and could still be instances of a generic type. You mentioned red black trees, which are a perfect example of that.
reply