Unicorn is designed to be fully customizable: you can select which Unicode algorithms and character properties are included or excluded from compilation. You can also exclude Unicode character blocks wholesale for scripts your application does not support. It's perfect for resource constrained devices like microcontrollers and IoT devices.
About me: I quit my Big Corp job a few years back to pursue my passion for software development and this is one of my first commercial releases.
1.2 Required Compliant (verified by compiling with Clang's -pdentic flag)
^^^^^^^^
Or am I too pedantic?List the platforms (& compilers) that you've tested on.
Compare (pros/cons) against other Unicode libs (like others have done elsewhere in this thread, i.e. https://news.ycombinator.com/item?id=42424637 and https://news.ycombinator.com/item?id=42424638)
As for the compilers, I’ve tested the library with GCC, Clang, and MSVC, and with the -pedantic flag like the GP mentioned. The library should build with any standard-compliant C99 compiler.
You can use Unicorn for non-commercial use [1], but yes, for commercial use you need to buy a license.
> It's much faster to use precomputed tables per algorithm
You're absolutely right about using precomputed tables per algorithm. That is the secret to the library's speed.
> Free and small is my safeclib, which does about half of it.
I like safeclib! It's nice to hear from the author. It's worth distinguishing that safeclib is a safer string library whereas Unicorn is a Unicode algorithms library, not a string library.
[1] https://github.com/railgunlabs/unicorn/blob/master/LICENSE
So every string library needs at least a compare function to find strings, with all the variants of same graphemes. Which leads us to NFC normalization for a start. Upcase tables and wordlength tables are also needed.
Is it me or does this feel a bit weird? It seems like you're using the comments section here to self-advertise for exposure.
I read it like — "businesses can't use this without paying the OP, however, if you're a business you can get 50% of the way there by using _my_ library, and you don't even have to pay me!". It comes off incredibly rude to try to undercut the OP like this.
> the right to use, copy, modify, merge, publish and distribute the Software [as long as you're not selling it or derivatives of the Software]
seems to line up with exactly what the folks involved in Free Software originally wanted — the ability to fix, patch, debug software that runs on their systems. I also think it's incredibly important to have non-commercial clauses given that the vast majority of technical infrastructure in the modern world is built on FOSS, all while the companies give nothing back and developers of FOSS starve.
If Valve can dump hundreds of developers into FOSS and within, what, 7 years? bring Linux almost to parity with and performance of Windows for gaming, imagine what would happen if FOSS developers were actually given funding!
MISRA C states the following rationale:
“A single point of exit is required by IEC 61508 and ISO 26262 as part of the requirements for a modular approach.
Early returns may lead to the unintentional omission of function termination code.
If a function has exit points interspersed with statements that produce persistent side effects, it is not easy to determine which side effects will occur when the function is executed.”
Note that the MISRA C rule is merely advisory, meaning it is a recommendation and not a hard requirement (i.e. it’s a “should” and not a “shall”).
It's a bit like an operating system write-protecting pages. Sometimes you write-protect pages that the process really shouldn't write to, like shared libraries or something like that. But sometimes you write-protect pages that you actually expect a program to write to, like memory-mapped pages, because then when the write happens it triggers something else (like copy-on-write or marking a page dirty or something).
Rules like "Don't use dynamically allocated memory" are one example of this -- not that they really expect you never to do it, but that marking it "Required" is a way to force you to document how you plan to make it safe.
Similarly, if it's easier to rearrange a function to have only a single exit point than to explain why you need multiple exit points, just rearrange it; you really need multiple exit points, just document why.
profile_begin("func");
a = temp_arena_begin();
// ... code
temp_arena_end();
profile_end();
I would not be in the least surprised if someone has a compiler/transpiler from a higher level language to some C code which checks all MISRA boxes.
I wish we could get an update on these rules, but this issue has been brought up many many times bwfore and has always been brushed away without a proper analysis.
Otherwise this leads to duplication of cleanup code similar to
allocate_something()
..
if failed(foo) {
deallocate_something()
return FAILED;
}
..
deallocate_something()
return SUCCESS;
Or hopefully, eventually, in C, thanks to the tireless efforts of JeanHeyde Meneide:
https://thephd.dev/just-put-raii-in-c-bro-please-bro-just-on...
Now I'm really curious, doesn't that mean some valid C++ code would fail to link for having multiple definitions of the same symbol??
I would expect name mangling to be a bijection from function prototype to string.
Projects like this never fail to impress me vis-a-vis source obfuscation. The 'generate.pyz' is an interesting twist on the usual practice.
# You may not reverse engineer, decompile, disassemble, or otherwise attempt
# to derive the source code or underlying structure of this script
This prohibition is void in certain relevant jurisdictions, for any publicly available product.ICU is a large library, typically around ~40 MB depending on the platform, whereas Unicorn, with all features enabled, is only about 600 KB.
ICU has a broader scope: it's not just a Unicode library, but also an internationalization library. Unicorn, on the other hand, is specifically focused on Unicode algorithms.
ICU wasn't designed to be customized. It's also non-MISRA compliant and written in C++11. In contrast, Unicorn is written in C99, fully customizable, MISRA compliant, and only requires a few features from libc [1]. It's far more portable.
[1] https://github.com/railgunlabs/unicorn/?tab=readme-ov-file#u...
Note that I am not interested in actually using Unicorn commercially, but my understanding is that this restriction makes the library incompatible with FOSS licenses such as GPL.
If you would like to chat, hit me up.