I avoid doing this now. It's more trouble than it's worth and it changes your code from a standard dialect of C into a custom one. Plus my eyes are old and they don't enjoy separating short identifiers.
> typedef struct { ... } String
I avoid doing this. Just use `struct string { ... };'. It makes it clear what you're handling. C23 finally gave us "auto", you shouldn't fret over typedefing everything anymore. I also prefer a "strbuf" type with an index and capacity so I can safely read and write to it with a derived "strview" having pointer and length only which references into the buffer.
> returning results
The general method of returning structures larger than two machine words is fairly inefficient. Plus you're cutting yourself off from another C23 gem which was [[nodiscard]]. If you want the 'ok' value checked then you can _really_ specify that. Put everything else behind a pointer passed in an argument. The sum type logic works just as well there.
> I tend to avoid the string.h functions most of the time, only employing the mem family when I want to, well, mess with memory.
So you use strlen() a lot and don't have to deal with multibyte characters anywhere in your code. It's not much of a strategy.
You don't need to support all multibyte encodings (i.e. DBCS, UCS-2, UCS-4, UTF-16 or UTF-32) characters if you're able to normalise all input to UTF-8.
I think, when you are building a system, restricting all (human language) input to be UTF-8 is a fair and reasonable design decision, and then you can use strlen to your hearts content.
When you strlen() a UTF8 string, you don't get the length of the string, but instead the size in bytes.
Same with indices. If you Index at [1] in a string with a flag emoji, you don't get a valid UTF8 code point, but instead some part of the flag emoji. This applies with any UTF8 code points larger than 1 byte, which there are a lot of.
UTF16 or UTF32 are just different encodings.
What am I missing?
That's why UTF8 libraries exist.
Yes, and?
> What am I missing?
A use-case? Where, in your C code, is it reasonable to get the number of multibyte characters instead of the number of bytes in the string?
What are you going to use "number of unicode codepoints" for?
Any usage that amounts to "I need the number of unicode codepoints in this string" is coupled to handling the display of glyphs within your program, in which case you'd be using a library for that anyway because graphics is not part of C (or C++) anyway.
If you're simply printing it out, storing it, comparing it, searching it, etc, how would having the number of unicode codepoints help? What would it get used for?
> I avoid doing this. Just use `struct string { ... };'. It makes it clear what you're handling.
Well then imagine if Gtk made you write `struct GtkLabel`, etc. and you saw hundreds of `struct` on the screen taking up space in heavy UI code. Sometimes abstractions are worthwhile.
If I know for sure I'm never going to need to do that then OK.
typedef struct foo foo;
and somewhere else
struct foo { … }
TBH, in that case the GtkLabel (and, indeed, the entire widget hierarchy) should be opaque pointers anyway.
If you're not using a struct as an abstraction, then don't typedef it. If you are, then hide the damn fields.
Now I think is between no good idea, and absolutely awful.
Yes, sometimes you wish some thing were different in a programming language “if only these types had shorter names”. But when you work in a team, first you should have consensus, and then modifying the language becomes a heavy load, that every new person in the project will have to lift.
“Modifying C is porting the Lisp curse to C” is my motto. Use all as standard, vanilla as possible.
It's not necessary to go back in time. I proposed a way to do it in modern C - no existing code would break:
https://www.digitalmars.com/articles/C-biggest-mistake.html
It's simple, and easy to implement.
You're thinking in decades. C standard committee is slower than that. This could have worked in practice, but probably never will happen in practice. Maybe people should start considering a language like D[1] as an alternative, which seems to have the spirit of both C and Go, but with much more pragmatism than either.
[1] https://en.wikipedia.org/wiki/D_(programming_language)#Criti...
https://www.nokia.com/bell-labs/about/dennis-m-ritchie/varar...
Meanwhile after UNIX was done at AT&T, the C language authors hardly cared for the C standard committee in regards to the C compiler supported features used in Plan 9 and Inferno, being only "mostly" compatible, followed up having a authoring role in Alef, Limbo and Go.
> The language accepted by the compilers is the core ANSI C language with some modest extensions, a greatly simplified preprocessor, a smaller library that includes system calls and related facilities, and a completely different structure for include files.
https://doc.cat-v.org/plan_9/4th_edition/papers/comp
I doubt most C advocates ever reflect on this.
> I doubt most C advocates ever reflect on this.
What would be the conclusion of this reflection? Assuming you have reflected on this, what was your conclusion?
typedef struct S { int a; } S;
becomes simply: struct S { int a; }
and unlike C: extern int foo();
int bar() { return foo(); }
int foo() { return 6; }
you have: int bar() { return foo(); }
int foo() { return 6; }
For more complex things: #include <foo.h>
becomes: import foo;Not only does it deliver a massive safety improvement, it dramatically speeds up strlen, strcmp, strcpy, strcat, etc. And you can pick out a substring without needing to allocate/copy. It's easy money.
Seems it was cast away
You basically have to trade performance for correctness, whereas in a language like C++, that's the whole purpose of the constructor, which works for all kinds of memory: auto, static, dynamic, whatever.
In C, to initialize a struct without dynamic memory, you could always do the following:
struct Name {
const char *name;
};
int parse_name(const char *name, struct Name *ret) {
if(name) {
ret->name = name;
return 1;
} else {
return 0;
}
}
//in user code, *hopefully*...
struct Name myname;
parse_name("mothfuzz", &myname);
But then anyone could just instantiate an invalid Name without calling the parse_name function and pass it around wherever. This is very close to 'validation' type behaviour. So to get real 'parsing' behaviour, dynamic memory is required, which is off-limits for many of the kinds of projects one would use C for in the first place.I'm very curious as to how the author resolves this, given that they say they don't use dynamic memory often. Maybe there's something I missed while reading.
typedef struct foo_ foo;
enum { FOO_SIZE = 64 };
foo *foo_init(void *p, size_t sz);
void foo_destroy(foo *p);
#define FOO_ALLOCA() \
foo_init(alloca(FOO_SIZE), FOO_SIZE)
Implementation (size checks, etc. elided): struct foo_ {
uint32_t magic;
uint32_t val;
};
foo *foo_init(void *p, size_t sz) {
foo *f = (foo *)p;
f->magic = 1234;
f->val = 0;
return f;
}
Caller: foo *f = FOO_ALLOCA();
// Can’t see inside
// APIs validate magicThis is nothing new in C. This problem has always existed by virtue of all struct members being public. Generally, programmers know to search the header file / documentation for constructor functions, instead of doing raw struct instantiation. Don‘t underestimate how good documentation can drive correct programming choices.
C++ is worse in this regard, as constructors don‘t really allow this pattern, since they can‘t return a None / false. The alternative is to throw an exception, which requires a runtime similar to malloc.
With enough compiler support they could be more than that. For example, I submitted a tagged union analysis feature request to gcc and clang, and someone generalized it into a guard builtin.
https://github.com/llvm/llvm-project/issues/74205
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112840
GCC proved to be too complex for me to hack this in though. To this day I'm hoping someone better than me will implement it.
For instance, it appears that no amount of proper discipline, even in the best developers, allows to replace proper array support with a naked pointer to a memory area.
I don't like it when compilers start getting in the way though. We use C because we want to do raw things like point a structure at some memory area in order to access the data stored there. The compiler's job is to generate the expected code without screwing it up by "optimizing" it beyond recognition because of strict aliasing or some other nonsense.
you can paper over _alot_ of Cs faults. ultimately its not really worth it, but its not nearly as fragile and arduous as you make it out to be
This one was a head-scratcher for me. Yeah, there's no cost to check for it, but architectures where CHAR_BIT != 8 are rarer even than 24-bit architectures.
E.g. `char p = (char )&astruct` may violate strict aliasing but `uint8_t p = (uint8_t )&astruct` is guaranteed legal. Then modulo, traps, padding, overflow, promotion, etc.
It's generally not clear to the compiler, and that can result in missed optimization opportunities.
If I find myself needing a bunch of dynamic memory allocations and lifetime management, I will simply start using another language–usually rust or C#.
Now that is some C habit for the modern day... But huh, not C.(To confirm: download the LhA archive from https://aminet.net/package/util/wb/MagicWB21p then open the archive in 7-zip, extract Patterns/BallsMany then load into an ILBM viewer, e.g. https://www.retroreversing.com/ilbm )
Instead use the stack much more and have a limit on how much data the program can handle fixed on startup. It adds the need to think what happens if your system runs out of memory.
Like OP said, it's not a solution for all types of programs. But it makes for very stable software with known and easily tested error states. Also adds a bit of fun in figuring out how to do it.
As someone who spent most of their career as an embedded dev, yes, this is fine for (like parent said) some types of software.
Even for places where you'd think this is a bad idea, it's still can be a good approach, for example allocating and mapping all memory up to the limit you are designing. Honestly this is how engineering is done - you have specified limits in the design, and you work explicitly to those limits.
So "allocate everything at startup" need not be "allocate everything at program startup", it can be "allocate everything at workflow startup", where "workflow" can be a thread, a long-running input-directed sequence of functions, etc.
For example, I am starting a tiny stripped down web-server for a project, and my approach is going to be a single 4Kb[1] block for each request, allocated via a pool (which can expand on pressure up to some maximum) and returned to the pool once the response is sent.
The 4Kb includes at most 14 headers (regardless of each headers size) with the remaining data for the JSON payload. The JSON payload is limited to at most 10 fields. This makes parsing everything "allocate-less" because the array holding pointers to the keys+values of the header is `const char *headers[14]` and to the payload JSON data `const char *fields[10]`.
A request that doesn't fit in any of that will be rejected. This means that everything is simple and the allocation for each request happens once at startup (pool creation) even while parsing the input.
I'm toying with the idea of doing the same for responses too, instead of writing it out as and when the output is determined during the servicing of the request.
-------------------------
[1] I might switch to 6Kb or 8Kb if requests need more; whatever number is chosen, it's going to be a static number.
One other thing I tend to do anything that needs to live longer than the current call stack gets copied into a queue of some sort. I feel it's kinda doing manually what rusts borrow checker tries to enforce.
Only allocate on the heap if you absolutely have to.
I still have a lot of conversion to do before I can try this in my hobby project, but these are interesting ideas.
#if CHAR_BIT != 8
#error "CHAR_BIT != 8"
#endif
In modern C you can use static_assert to make this a bit nicer. static_assert(CHAR_BIT == 8, "CHAR_BIT is not 8");
...although it would be a bit of a shame IMHO to add that reflexively in code that doesn't necessarily require it.https://en.cppreference.com/w/c/language/_Static_assert.html
It's a pretty neat way to drop some corner cases from your mental load without building subtle traps
Bro, that was written in 2019. If it's not old enough to drink it's not yet evergreen. But it's also long-winded. A 25-minute read, and y'know what the conclusion is? "Parsing leaves you with a new data structure matching a type, validation checks if some data technically complies with a type (but might not later be parsed correctly)".
I need all the baby programmers in the back to hear me: type systems are bikeshedding. The point of a type is only to restrict computation to a fixed set. This concept can be applied anywhere you need to ensure reliability and simplicity. You don't need a programming language to natively support types in order to implement the concept yourself in that language.
In a programming language that doesn't enforce types, how do you implement
> "Parsing leaves you with a new data structure matching a type, validation checks if some data technically complies with a type (but might not later be parsed correctly)".
Adding complexity to your type system and to the representation of types within your code has a cost in terms of mental overhead. It's become trendy to have this mental model where the cost of "type safety" is paid in keystrokes but pays for itself in reducing mental overhead for the developers. But in reality you're trading one kind of mental overhead for another, the cost you pay to implement it is extra.
It's like "what are all the ways I could use this wrong" vs "what are all the possibilities that exist". There's no difference in mental overhead between between having one tool you can use in 500 ways or 500 tools you can use in 1 way, either way you need to know 500 things, so the difference lies elsewhere. The effort and keystrokes that you use to add type safety can only ever increase the complexity of your project.
If you're going to pay for it, that complexity has to be worth it. Every single project should be making a conscious decision about this on day one. For the cost to be worth it, the rate of iteration has to be low enough and the cost of runtime bugs has to be high enough. Paying the cost is a no brainer on a banking system, spacecraft or low level library depended on by a million developers.
Where I think we've lost the plot is that NOT paying the cost should be a no brainer for stuff like front end web development and video games where there's basically zero cost in small bugs. Typescript is a huge fuck up on the front end, and C++ is a 30 year fuck up in the games industry. Javascript and C have problems and aren't the right languages for those respective jobs, but we completely missed the point of why they got popular and didn't learn anything from it, and we haven't created the right languages yet for either of those two fields.
Same concept and cost/benefit analysis applies to all forms of testing, and formal verification too.
I'll ditch type-safety in experimental/exploratory code; I'll use Lisp (or, more recently, Python) to test if something is a good idea. For anything that ships to production, I think a basic level of type enforcement is necessary, even if you don't want the whole type zoo.
For your Javascript f/end context, I like the proposed TC39 approach (https://github.com/tc39/proposal-type-annotations?tab=readme...). The typing is optional, does not break existing syntax and can still be used to enforce a basic level of type safety if the developer wants it.
----------------------------
[1] I upvoted you anyway. Your broader point is still valid.
#define END }
/* scream! */
Given the C++ adoption on 1990's commercial software and major consumer operating systems (Apple, IBM, Microsoft, Be), I bet if the FSF with their coding guidelines had not advocated for C, the adoption would not taken off beyond those days.
"Using a language other than C is like using a non-standard feature: it will cause trouble for users. Even if GCC supports the other language, users may find it inconvenient to have to install the compiler for that other language in order to build your program. So please write in C."
The GNU Coding Standard in 1994, http://web.mit.edu/gnu/doc/html/standards_7.html#SEC12
And yet another C++ person salty that people prefer simpler things.
I agree with this, but then again, not many people are learning C now anyway. It will die away from natural attrition anyway, is my point.
The K&R C does have a few advantages, because the compilers at the time were not so aggressive in optimisation, and will consistently emit code that (for example) performed a NULL dereference (or other UB), ensuring things like consistently crashing instead of silently losing data/doing the wrong thing.
Well, certainly simpler than C++, at any rate.
I mean, just knowing the assignment rules in C++ is worthy of an entire book on its own. Understandably, the single rule of "assignment is a bitwise copy of the source variable into the destination variable" is inflexible, but at least the person reading the local code can, just from the current scope, determine whether some assignment is a bug or not!
In many ways, C++ requires global context when reading any local scope: will the correct destructor get called? Can this variable be used as an argument to a function (a lack of a copy constructor results in the bitwise copy for on stack, with the destructor for that instance running twice - once in the stack and again when the scope ends)? Is this being passed by reference (i.e. it might be modified by the function we are calling) or by value (i.e. we don't need to worry about whether `bar` has been changed after a call to `foo(bar)`).
Many programmers don't like holding lots of global scope in their head when working in some local scope. In C, all those examples above are clear in the local scope.
All programmers who prefer C over C++ have already tried C++ in large and non-trivial projects before walking away. I doubt that the reverse is true.
There is this urban myth C is simple, from folks that never read either ISO C manual, can't read legalese, never spent much time browsing the compiler reference manual.
Mostly learnt K&R C, assume the world is simple, until the code gets ported into another platform or compiler.
Yet in such a simple language, I keep waiting to meet the magical developer that never wrote memory corruption errors with pointer arithmetic, string and memory library functions.
And yet you know from previous discussion with folks like Uecker and myself have done all those things, and still walked away from C++.
In my case, I stepped back even after having a decade of work experience in it. Anything needing more abstraction than C, C++ is not going to be a good fit anyway (there's better languages).
> Yet in such a simple language, I keep waiting to meet the magical developer that never wrote memory corruption errors with pointer arithmetic, string and memory library functions.
Who made that claim? This sounds like a strawman - "If you use C you'll never make this class of errors", which no one said in this conversation.
In any case, the point is even more true of C++ - I have yet to meet this magical C++ programmer that never hits the few dozens of footguns it has that C doesn't.