In 2016, I was trying to construct orthogonal irreducible matrix representations of various groups (“irreps”). The problem was that most of the papers describing how to construct these matrices used a recursive approach that depended on having already constructed the matrix elements of a lower dimensional irrep. Thus the irrep dimension n became quite an annoying parameter, and function calls were very slow because you had to construct the irrep for each new group element from the ground up on every single call.
I ended up using Julia’s @generated functions to dynamically create new versions of the matrix construction code for each distinct value of n for each type of group. So essentially it would generate “unrolled” code on the fly and then use LLVM to compile that a single time, after which all successive calls for a specific group and irrep dimension were extremely fast. Was really quite cool. The only downside was that you couldn’t generate very high dimensional irreps because LLVM would begin to struggle with the sheer volume of code it needed to compile, but for my project at the time that wasn’t much of a concern.
I am surprised, that there is no programming language doing similar stuff - having regular expressions which are compiled as native code instead of just using a runtime library like PCRE2. Implementing this in C++ or Rust should be relatively easy.
Pretty sure Julia can do it.
Unfortunately, so far as I can tell:
- LMS has not been updated for years and never moved to scala 3. https://github.com/TiarkRompf/virtualization-lms-core
- LMS was written to also use "scala-virtualized" which is in a similar situation
There's a small project to attempt to support it with virtualization implemented in scala 3 macros, but it's missing some components: https://github.com/metareflection/scala3-lms?tab=readme-ov-f...
I'd love to see this fully working again.
using sign = Atoms<'+', '-'>;
using digit = Range<'0', '9'>;
using onenine = Range<'1', '9'>;
using digits = Some<digit>;
using integer = Seq<Opt<Atom<'-'>>, Oneof<Seq<onenine, digits>, digit>>;
using fraction = Seq<Atom<'.'>, digits>;
using exponent = Seq<Atoms<'e', 'E'>, Opt<sign>, digits>;
using number = Seq<integer, Opt<fraction>, Opt<exponent>>;
and I've confirmed that it does all get inlined and optimized on -O3.JSON parser example here - https://github.com/aappleby/matcheroni/blob/main/examples/js...
Cue the smug Lisp weenies laughing quietly in the background.
The problem with applying this technique generally is the amount of code generated. But what if you can optimize that too.. perhaps share the common parts of the AST between the copies of the code that are generated, and overlay the changes with some datastructure.
Reading this take on it, it feels like a JIT compiler could also accomplish a fair bit of this? I'm also reminded of the way a lot of older programs would generate tables during build time. I'm assuming that is still fairly common?
A 54:47 presentation at CppCon 2018 is worth more than a thousand words...
see https://www.youtube.com/watch?v=QM3W36COnE4
followup CppCon 2019 video at https://www.youtube.com/watch?v=8dKWdJzPwHw
As the above github repo mentions, more info at https://www.compile-time.re/
The other optimization I'd guess at would be to async/thread/process the checking before and after the @ symbol, so they can run in parallel (ish). Extra cpu time, but speed > CPU cycle count for this benchmark.
In the math department, we had a Moodle the students in the first year of my university in Argentina.
When we started like 15 years ago, the emails of the students and TA were evenly split in 30% Gmail, 30% Yahoo!, 30% Hotmail and 10% others (very aproxímate numbers).
Now the students have like 80% Gmail, 10% Live/Outlook/Hotmail and 10% others/Yahoo. Some of the TA are much older, so perhaps "only" 50% use Gmail.
The difference is huge. I blame the mandatory gmail account for the cell phone.
So, checking only @gmail.com is too strict, but a first fast check for @gmail.com and later the complete regex may improve the speed a lot in the real word.
"simply do X" is such a programmer fallacy at this point I'm surprised we don't have a catchy name for it yet, together with a XKCD for making the point extra clear.
The trope is "At Google we..." and then casually mention "violating" the CAP theorum with Spanner or something.
It is simple, and I really do hope any first year CS student could extract a substring from a string. Have LLMs so atrophied our programming ability that extraction of a substring is considered evidence of a superior programmer?
Yes, that Eric Schmidt, CEO of Google.
1987 was the clone, 'flex' :-)
It did "compiling the regex into regular code, which can then be optimized by the compiler" before the C programming language as we know it was created. I think 'lex' was compiling regex to C before the C language even had 'struct' types, 'printf' or 'malloc'.