For your level 3 code, it should be `int main() { List(Foo) foo_list = {NULL};`
Note that working around a lack of `typeof` means you can't return anything. Also, your particular workaround allows `const`ness errors since `==` is symmetrical.
You can't safely omit `payload` since you need it to know the correct size. Consider a `List(int64_t)` and you try to add an `int32_t` to it - this should be fine, but you can't `sizeof` the `int32_t`. Your code is actually lacking quite a bit to make this work.
=====
There are 2 major limitations to generics in C right now:
* Delegating to a vtable (internal or external) is limited in functionality, since structs cannot contain macros, only functions.
* Delegating to an external vtable (mandatory to avoid overhead) means that you have to forward-declare all of the types you'll ever use a vtable with. So far the best approach I've found is to declare (but not define) static functions in the same forwarding header I declare the typedefs in; note that GCC and Clang differ in what phase the "undefined static" warning appears in for the case where you don't actually include that particular type's header in a given TU.
(think about writing a function that accepts either `struct SizedBuffer {void *p; size_t len;};` or `struct BoundedBuffer {void *begin; void *end;};`, and also const versions thereof - all from different headers).
We went down the rabbit hole of writing a compiler for this as part of a project I used to work on (Apache Clownfish[1], a subproject of the retired Apache Lucy project). We started off parsing .h files, but eventually it made sense to create our own small header language (.cfh "Clownfish Header" files).
Here's some generated code for invoking the CharBuf version of the "Clone" method defined in parent class "Obj":
typedef cfish_CharBuf*
(*CFISH_CharBuf_Clone_t)(cfish_CharBuf* self);
extern uint32_t CFISH_CharBuf_Clone_OFFSET;
static inline cfish_CharBuf*
CFISH_CharBuf_Clone(cfish_CharBuf* self) {
const CFISH_CharBuf_Clone_t method
= (CFISH_CharBuf_Clone_t)cfish_obj_method(
self,
CFISH_CharBuf_Clone_OFFSET
);
return method(self);
}
Usage: cfish_CharBuf *charbuf = cfish_CharBuf_new();
cfish_CharBuf *clone = CFISH_CharBuf_Clone(charbuf);
We had our reasons for going to these extremes: the point of Clownfish was to provide a least-common-denominator object model for bindings to multiple dynamic languages (similar problem domain to SWIG), and the .cfh files also were used to derive types for the binding languages. But there was truly an absurd amount of boilerplate being generated to get around the issue you identify.This is why almost everybody just uses casts to void* for the invocant, skipping type safety.
In C `int main()` means the function takes an unknown number of arguments. You need `int main(void)` to mean it doesn't take any arguments. This is a fact frequently forgotten by those who write C++.
It's not really relevant for main() though, even in older C versions main() works fine and simply means "I don't need argc and argv".
> 14. An empty list in a function declarator that is part of a definition of that function specifies that the function has no parameters.
https://stackoverflow.com/questions/156767/whats-the-differe...
In C I guess that's true. In languages more concerned with compile-time rigor, it often isn't. Not a correction, just an observation.
That’s not the distinction being made by those terms.
“Parameter” refers to a named variable in a function definition.
“Argument” refers to an actual value that’s passed to a function when it’s called.
It’s exactly the same as the distinction between variables and values (which you probably see the use for), just applied to the special cases of function signatures and function calls.
(As an aside, in the lambda calculus this relationship becomes a perfect equivalence: all variables are parameters and all values are arguments.)
> "Parameters are named variables declared as part of a function. They are used to reference the arguments passed into the function."
-- MDN, https://developer.mozilla.org/en-US/docs/Glossary/Parameter
> "A parameter is a special kind of variable used in a function to refer to one of the pieces of data provided as input to the function. These pieces of data are the values of the arguments with which the function is going to be called/invoked."
-- Programming Fundamentals, https://press.rebus.community/programmingfundamentals/chapte...
> "Parameters refer to the variables listed in a function's declaration, defining the input that the function can accept. Arguments, however, are the actual values passed to the function when it is called, filling the parameters during execution."
-- https://www.geeksforgeeks.org/computer-science-fundamentals/...
While you might be tempted to "do you" and use your own idiosyncratic definitions, I advise against it, since it makes it difficult for you to understand what others are saying, and vice versa.
...by what authority? c'mon, communication is important, and insisting on the correctness of definitions tanks that.
EDIT: however, I will concede there's good evidence for widespread usage of this, and I'll adjust my usage accordingly. Insisting on "correctness" is just asinine, though.
Of course de-facto things are more nunanced.
C23 changed what fn() means outside a function definition.
struct foo { ... };
struct bar { ... };
struct bar check_this_out(struct foo foo) {
return ((union { struct foo foo; struct bar bar; }){ .foo = foo}).bar;
}
struct bar *also_this(struct foo *foo) {
union { struct foo foo; struct bar bar; } *tmp = (void*)foo;
return &tmp->bar;
}
`malloc(sizeof(*node) + data_size);`
The trick#0 you mention is how I made an entire C dialect. Here is a generic binary heap, for example https://github.com/gritzko/librdx/blob/master/abc/HEAPx.h The syntax is a bit heavyweight, but a huge huge advantage is: you get regular C structs in the end, very plain, very predictable, very optimizable. Compiler would eat them like donuts.
In the other cases, it is void* and runtime memory sizing and you have to define macros anyway.
There are workarounds for at least two of the problems the author mentions. Naming can be changed from Bar_func(args…) to func(Bar)(args…) with a function name macro that just does name mangling. You can avoid some of the binary bloat by using weak symbols, letting the linker deduplicate functions shared between translation units at link time.
There are other problems for generic containers of pointer types however, you can work around them by using a typedef or a type alias.
Intrusive data structures are more convenient in C still, but working with them in a debugger is a pain.
There is little benefit in monomorphizing the implementation of a data structure like a linked list where its behavior doesn't depend on the contents of the data it contains (compared to, say, a max heap)
Made me laugh out loud!
This casting of the functions to different argument types constitutes the core of the type safety of the generic invocations; I’m not sure it can be fixed.
I also don’t agree it’s “squeamish” to be wary of aliasing analysis going wrong. It’s not a clean abstraction and can hide subtle bugs down the road.
int f0(int x) { return x; }
int f1(int x, int y) { return y; }
into a single function, so at runtime, (void(*)())f0 and (void(*)())f1 would compare equal. AFAIK, you are not guaranteed by the standard that functions of different signatures would, when their addresses are taken, result in different pointers, so it's not technically a bug... but it's quite surprising, and ruins certain tricks.I've mostly seen in done in Haskell, but have used it in it Scala as well to simulate type hierarchies not present in the actual type system.
In a way, this union trick is kind of like a phantom type, since the secondary type is never actually used.
Now I'm wondering what phantom types would look like in C...
8-/
however we can set a standard and expectation for new projects to use c++, and we do and set an expectation to target a specific std.
i see this sentiment quite a lot on hackernews -- feels like a lot of people saying "git gud" -- i would expect a lot more nuance applied here.
Traditionally microcontroller firmwares as well, though those are increasingly friendly to C++, you just have to be careful about allocations as C++ makes it way easier to accidentally allocate than C does.
Obviously you mean the Linux kernel, specifically. Also C++ has gotten a lot better since 2003 and before.
Examples of C++ usage in commercial codebases:
- anything Nintendo has written since 2010 (including microkernel, entire OS and all, for 3DS and Switch 1/2)
- well-known, high performing databases like ScyllaDB
> you just have to be careful about allocations as C++ makes it way easier to accidentally allocate than C does.
With the exception of exceptions (and coroutines but that's easily customizable), it's merely a standard library thing and doesn't affect language features (incl. language features exposed as std:: funcs/traits)
C++ has got a bad rep because it never fixes broken stdlib parts due to API and ABI concerns, and has many footguns due to this, that might make Rust and C better choices in corporate environments. However, IMHO, C++ is a much better language than C for personal projects, or when having a team of experienced folks.
Anecdotally, GCC and GDB successfully (and incrementally) switched to C++ form C in the recent past.
The Linux kernel will never do it for ideological reasons of course.
I don't know about Postgres.
> Obviously you mean the Linux kernel, specifically.
Or any BSD, or Illumos, or Solaris, or any Unix-derived kernel, or... Even the Windows kernel is in C (or a very cut-down C++).
struct foo decl = {
.member = /* ... */
.next = &(struct nested_pointer) {
.nested_member = /* ... */,
},
.array = (struct nested_array[]) {
[0] = { /* ... */ },
}
};
This pattern does not work in C++ as the nested declarations become temporaries.The question was "please provide examples where switching to C++ involves jumping through even more hoops", and in my view requiring downstream to use a C++ environment when they're expecting to use a C environment qualifies.
Unless of course the project is using such a old compiler.
> yes, unless you're using a common one.
I was really disappointed that Microsoft decided to backtrack on C++'s is the future, after the new found Linux and FOSS love.
https://herbsutter.com/2012/05/03/reader-qa-what-about-vc-an...
https://devblogs.microsoft.com/cppblog/c11-and-c17-standard-...
Not that it matters much, as nowadays C and C++ have a new policy at Microsoft due to goverments and cyberlaws.
https://azure.microsoft.com/en-us/blog/microsoft-azure-secur...
https://blogs.windows.com/windowsexperience/2024/11/19/windo...
Beware that only tagged unions are considered the same type under the new rule, provided they have the same structrure and the same tag.
The List(T) macro should be changed to generate a different tag for each different T. Which is trivial (with ##) for simple one-word types, but impossible for even mildly complex ones like pointers to char (strings).
Of course you can force yourself to typedef any type before using it in a List, but it looses much of its versatility. Example:
typedef char *str;
List(str) my_list_of_str;
List(str) tokenize(str input) {...}
I don't get it. Tagged union is just a design pattern.
> struct { ... } foo;
and > struct bar { ... } foo;
The latter has an identifier, bar; the former doesn't. The standard uses tag to refer to the identifier name, if any, in an enum, struct, or union declaration.I've always called "tag" the id that optionally follows struct/union/enum. Is it the wrong word? Some specs call it "name", but "unnamed union" sounds dangerously similar to "anonymous union", which is a different concept, namely (no pun!) an unnamed member of an outer struct or union whose submembers can be accessed as if they belong in the outer one. E.g.
struct {
struct {
int m;
}; // no name: anonymous
struct s;
s.m = 1;
btw. For ISO C WG14… has anyone suggested adding _Include to the preprocessor, along the lines of _Pragma? It'd really help with doing this kind of really long macros, hiding the clunky "#define, #define, #define, #include" inside a macro…
For my own hashmap implementation I followed a wasteful aproach since I’m. It targeting embedded.
I created a structure called a hashmap object. It has two elements: a void pointer and a char pointer. The first one is the data and the second one is the metadata. The metadata is basically a string were the user can put anything, the type of the data, more data, whatever.
Then I preallocate 10s of thousands of hashmap objects. That way users of my hashmap don’t have to think about aollocating and de allocating hashmap nodes, they just insert, delete and search freely. They still have to care about allocating and de allocating they’re own data though.
INIT_LIST_HEAD is of form VERB_NOUN so is called from within a function to programatically initialise the list.
LIST_HEAD_INIT is NOUN_VERB and is used within a structure initialiser not from a function.
But my main point was to show the "embed the list in the data" approach rather than "embed the data in the list" or "point to the data from the list" and not to discuss the naming details in the kernel implementation of the concept.
Disclaimer: I haven’t looked at this author’s code, just pointing out that list nodes that consist of (head, tail) are a common pattern with a clear rationale.
See https://github.com/rustyrussell/ccan/blob/master/ccan/list/_...
e.g. hash table wrapper: https://github.com/FRRouting/frr/blob/master/lib/typesafe.h#...
(cf. https://docs.frrouting.org/projects/dev-guide/en/latest/list...)
> One annoying thing about C is that it does not consider these two variables to have the same type
C23 solves that too: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3037.pdf
Supported by latest GCC and Clang, but not by MSVC.
[1] See https://github.com/wahern/autoguess/blob/b44556e4/config.h.g... (that's the 2015 revision, but HEAD has the same code).
Digging through some old code of mine (circa 2009) I found this bit:
#elif _MSC_VER >= 1310
#define typeof(type) __typeof(type)
So somehow I had the impression Visual Studio .NET 2003 (7.1)[1] added __typeof. I'm still holding out hope someone will come to my rescue and reply that once upon a time MSVC had __typeof, but removed it. But for now it seems past me is gaslighting present me.[1] See https://learn.microsoft.com/en-us/cpp/overview/compiler-vers... for mapping between Visual Studio versions and _MSC_VER.
EDIT: Ah ha! It seems Microsoft did support __typeof, but perhaps only for "managed" C++ (aka C++ .NET)?
> One thing to watch out for when using the __typeof operator in managed C++ is that __typeof(wchar_t) can return different values depending on the compilation options.
Source: https://learn.microsoft.com/en-us/archive/msdn-magazine/2002... (See also https://learn.microsoft.com/en-us/archive/msdn-magazine/2005...)
sounds like
"I build life-size houses out of toothpicks"
I can do that to, but I have no desire to. It's so much easier to just switch to Rust.
struct ListNode(T) {
ListNode* next;
T data;
}
T!int node;
Why suffer the C preprocessor? Using preprocessor macros is like using a hammer for finish carpentry, rather than a nail gun. A nail gun is 10x faster, drives the nail perfectly every time, and no half moon dents in your work.On some projects you must use C.
My compilers were originally written in C. I started using the C preprocessor to do metaprogramming. After some years I got fed up with it and removed nearly all of the preprocessor use, and never looked back. My code was much easier to understand.
An amusing story: long ago, a friend of mine working for Microsoft was told by a team leader that a 50K program had a bug in it, and sadly the developer was long gone. He'd assigned programmer after programmer to it, who could not fix it. My friend said he'd give it a try, and had it fixed in 2 hours.
The source code was written in Microsoft MASM, where the M stood for "Macro". You can guess where this is going. The developer had invented his own high level language using the macro system (which was much more powerful than C's). Unfortunately, he neglected to document it, and the other programmers spent weeks studying it and could not figure it out.
The leader, astonished, asked him how he figured it out in 2 hours? My friend said simple. He assembled it to object code, then disassembled the object code with obj2asm (a disassembler I wrote that converts object code back to source code). He then immediately found and fixed the bug, and checked in the "new" source code which was the disassembled version.
I've seen many very smart and clever uses of the C macros, the article is one of them. But isn't it time to move on?
Using C++ templates for a linked list type doesn't make a mess.
Currently, ImportC runs cpp and then lexes/parses the resulting C code for use in D.
So nail the architrave with the hammer until the nail is say 1/8" proud, then punch it home.
All power tools have trade offs vs the manual alternates, often tipping towards their use, but not always.
I have had success on an occasion using a belt fed screw gun when fitting plaster boards, especially for ceiling boards.
However I don't do joinery often enough to justify the use of a nail gun.
(list)->payload = (item); /* just for type checking */\
That is not a no-op. That is overwriting the list head with your (item). Did you mean to wrap it in an `if(0)`?As a thought experiment, you could certainly have users define their own hash and equality functions and attach them to the table-entries themselves. On first thought, that sounds like it would be rife with memory safety issues.
At the end of the day, it is all just bytes. You could simply say that you will only key based on raw memory sequences.
All of Apple's modern platforms use this concept pervasively, because the Objective-C Foundation framework has a common primitive data structure for it (NSDictionary).
With similar articles in The C/C++ Users Journal and Dr Dobbs, throught their existence.
While the effort is commendable, maybe pick a safer language that actually has all the features for type safe generic code.
However, the problem that I have with this idea as a general solution for generics is that it doesn't seem to solve any of the problems posed by the most similar alternative: just having a macro that defines a struct. The example shown in the article:
#define List(type) union { \
ListNode *head; \
type *payload; \
}
could just as easily be: #define List(type) struct { \
type *head; \
/* Other data, such as node/element count... */ \
}
(As long as our nodes are maximally aligned - which they will be if they're dynamically allocated - it doesn't matter whether the pointer we store to the list head is ListNode *, type *, void *, or any other regular pointer type.)The union approach has the same drawback as the struct approach: untagged unions are not compatible with each other, so we have to typedef the container in advance in order to pass in and out of functions (as noted in the article). This is broadly similar to the drawback from which the "generic headers" approach (which I usually call the "pseudo-template" approach) suffers, namely the need for boilerplate from the user. However, the generic-headers/pseudo-template approach is guaranteed to generate the most optimized code thanks to function specialization[1], and it can be combined with another technique to provide a non-type-prefixed API, as I discuss here[2] and demonstrate in practice here[3].
I'd also like to point to my own approach to generics[4] that is similar to the one described here in that it hides extra type information in the container handle's type - information that is later extracted by the API macros and passed into the relevant functions. My approach is different in that rather than exploiting unions, it exploits functions pointers' ability to hold multiple types (i.e. the return type and argument types) in one pointer. Because function pointers are "normal" C types, this approach doesn't suffer from the aforementioned typedef/boilerplate problem (and it allows for API macros that are agnostic to both element type/s and container type). However, the cost is that the code inside the library becomes rather complex, so I usually recommend the generic-headers/pseudo-template approach as the one that most people ought to take when implementing their own generic containers.
[1] https://gist.github.com/attractivechaos/6815764c213f38802227...
[2] https://github.com/JacksonAllan/CC/blob/main/articles/Better...
Plus an article about C was at the top of hacker news all day today.
C is still relevant, but TIOBE numbers are not.
The #1 data structure in any program is array.
There are quite a few problems that specialised containers are suited for, that's why they were created.
The situation where you need a red black tree with 10 different key/value combos isn’t real.
Otherwise, that seems unwise to me. Not every user of a generic type has to be generic. A major selling point of generic types is that you write a library once, then everyone can instantiate it. Even if that is the only instance they need in their use case, you have saved them the trouble of reinventing the wheel.
No colleague of mine may need 10 different instances of any of my generic libraries, but I bet that all of them combined do, and that our bosses are happy that we don't have to debug and maintain 10+ different implementations.
No. There are a few fundamental ones which work well in a generic context. When you tailor make it you find simplifying assumptions.
One I am aware of is a hash map that doesn’t need to delete individual keys.
Somehow dissemination of Knuth v3,chapter6.4 Algorithm D has been weak, though it has lately become known as "backshift deletion" - unsure who coined that. It is limited to linear probing (but then these days that also creates the least memory traffic/potential latency). With this approach, there is no real specialized form of "hash tables that can delete". You may already know all of this and not disagreeing with anything. Just a natural follow-up.
Isn't resizing logic already a counterexample?