FilterHN

The Impossible Optimization, and the Metaprogramming to Achieve It

68 points

by melodyogonna

5 days ago

| past

| 12 comments

| verdagon.dev

| HN

▲

omnicognate

23 hours ago

[-]

The language here is Mojo, which the article seems to assume you know and doesn't say enough for you to deduce until half way through and after multiple code examples. I don't know how you're supposed to know this as even the blog it's on is mostly about Vale. From the intro I was expecting it to be about C++.

▲

totalperspectiv

22 hours ago

[-]

The author works for Modular. He shared the write up on the Mojo Discord. I think Mojo users were the intended audience.

▲

Xcelerate

1 day ago

[-]

> Eliminate redundant matrix operations (like two transposes next to each other)

In 2016, I was trying to construct orthogonal irreducible matrix representations of various groups (“irreps”). The problem was that most of the papers describing how to construct these matrices used a recursive approach that depended on having already constructed the matrix elements of a lower dimensional irrep. Thus the irrep dimension n became quite an annoying parameter, and function calls were very slow because you had to construct the irrep for each new group element from the ground up on every single call.

I ended up using Julia’s @generated functions to dynamically create new versions of the matrix construction code for each distinct value of n for each type of group. So essentially it would generate “unrolled” code on the fly and then use LLVM to compile that a single time, after which all successive calls for a specific group and irrep dimension were extremely fast. Was really quite cool. The only downside was that you couldn’t generate very high dimensional irreps because LLVM would begin to struggle with the sheer volume of code it needed to compile, but for my project at the time that wasn’t much of a concern.

▲

Panzerschrek

4 hours ago

[-]

For several years I have written a dedicated compiler for regular expressions. You basically pass a regular expression and get an object file containing optimized matching code. It uses LLVM library internally to perform optimizations and machine code generation. It should be generally faster to compile compared to solutions involving constexpr-based metaprogramming.

I am surprised, that there is no programming language doing similar stuff - having regular expressions which are compiled as native code instead of just using a runtime library like PCRE2. Implementing this in C++ or Rust should be relatively easy.

▲

Archit3ch

22 hours ago

[-]

> Mojo, D, Nim, and Zig can do it, and C++ as of C++20. There are likely some other languages that can do it, but these are the only ones that can truly run normal run-time code at compile time

Pretty sure Julia can do it.

▲

jburgy

21 hours ago

[-]

https://bur.gy/2022/05/27/what-makes-julia-delightful.html confirms your hunch

▲

abeppu

17 hours ago

[-]

To tie this specific example to a larger framework: In scala land, Tiark Rompf's Lightweight Modular Staging system handled this class of metaprogramming elegantly, and the 'modular' part included support of multiple compilation targets. The idea was that one could incrementally define/extend DSLs that produce an IR, optimizations in that IR, and code generation for chunks of DSLs. Distinctions about stage are straight-forward type-signature changes. The worked example in this post is very similar to one of the tutorials for that system: https://scala-lms.github.io/tutorials/regex.html

Unfortunately, so far as I can tell:

- LMS has not been updated for years and never moved to scala 3. https://github.com/TiarkRompf/virtualization-lms-core

- LMS was written to also use "scala-virtualized" which is in a similar situation

There's a small project to attempt to support it with virtualization implemented in scala 3 macros, but it's missing some components: https://github.com/metareflection/scala3-lms?tab=readme-ov-f...

I'd love to see this fully working again.

▲

aappleby

15 hours ago

[-]

I did something similar to this with C++ templates - it's Parsing Expression Grammar based, so not full regex, but enough for a lot of tasks:

  using sign      = Atoms<'+', '-'>;
  using digit     = Range<'0', '9'>;
  using onenine   = Range<'1', '9'>;
  using digits    = Some<digit>;
  using integer   = Seq<Opt<Atom<'-'>>, Oneof<Seq<onenine, digits>, digit>>;
  using fraction  = Seq<Atom<'.'>, digits>;
  using exponent  = Seq<Atoms<'e', 'E'>, Opt<sign>, digits>;
  using number    = Seq<integer, Opt<fraction>, Opt<exponent>>;

and I've confirmed that it does all get inlined and optimized on -O3.

JSON parser example here - https://github.com/aappleby/matcheroni/blob/main/examples/js...

▲

lisper

15 hours ago

[-]

> can compilers really execute general code at compile-time?

Cue the smug Lisp weenies laughing quietly in the background.

▲

cadamsdotcom

14 hours ago

[-]

Seems like an optimization that could be applied quite generally - as the author mentions at the end there’s lots of places this could be used.

The problem with applying this technique generally is the amount of code generated. But what if you can optimize that too.. perhaps share the common parts of the AST between the copies of the code that are generated, and overlay the changes with some datastructure.

▲

taeric

22 hours ago

[-]

Gave me a smile to see the shout out to LISP in there.

Reading this take on it, it feels like a JIT compiler could also accomplish a fair bit of this? I'm also reminded of the way a lot of older programs would generate tables during build time. I'm assuming that is still fairly common?

▲

pfdietz

21 hours ago

[-]

Yes, this is all straightforward with Lisp macros. Beyond that, you can call the compile function in Common Lisp and do all this at run time too.

▲

BoingBoomTschak

20 hours ago

[-]

In fact, there's https://github.com/telekons/one-more-re-nightmare for CL.

▲

SuperV1234

23 hours ago

[-]

https://github.com/hanickadot/compile-time-regular-expressio...

▲

canucker2016

18 hours ago

[-]

FYI: This is a C++ template version of compile time regex class.

A 54:47 presentation at CppCon 2018 is worth more than a thousand words...

see https://www.youtube.com/watch?v=QM3W36COnE4

followup CppCon 2019 video at https://www.youtube.com/watch?v=8dKWdJzPwHw

As the above github repo mentions, more info at https://www.compile-time.re/

▲

fragmede

1 day ago

[-]

Depending on the userbase of the site, simply checking for @gmail.com at the end, I'd bet, would result in a quick win, as well as restricting the username's alphabet to allowed Gmail characters.

The other optimization I'd guess at would be to async/thread/process the checking before and after the @ symbol, so they can run in parallel (ish). Extra cpu time, but speed > CPU cycle count for this benchmark.

▲

gus_massa

22 hours ago

[-]

[Rehashing an old comment]

In the math department, we had a Moodle the students in the first year of my university in Argentina.

When we started like 15 years ago, the emails of the students and TA were evenly split in 30% Gmail, 30% Yahoo!, 30% Hotmail and 10% others (very aproxímate numbers).

Now the students have like 80% Gmail, 10% Live/Outlook/Hotmail and 10% others/Yahoo. Some of the TA are much older, so perhaps "only" 50% use Gmail.

The difference is huge. I blame the mandatory gmail account for the cell phone.

So, checking only @gmail.com is too strict, but a first fast check for @gmail.com and later the complete regex may improve the speed a lot in the real word.

▲

bee_rider

20 hours ago

[-]

Maybe I am old, but I like to keep as much communication as possible going through the university email. It just feels more official somehow.

▲

embedding-shape

22 hours ago

[-]

Tell us you used to work at Google, without telling us.

"simply do X" is such a programmer fallacy at this point I'm surprised we don't have a catchy name for it yet, together with a XKCD for making the point extra clear.

▲

fragmede

18 hours ago

[-]

Tell us you don't actually work with any Google engineers... blah blah blah

The trope is "At Google we..." and then casually mention "violating" the CAP theorum with Spanner or something.

It is simple, and I really do hope any first year CS student could extract a substring from a string. Have LLMs so atrophied our programming ability that extraction of a substring is considered evidence of a superior programmer?

▲

spectraldrift

20 hours ago

[-]

Having never heard of mojo before, I found this article fascinating. It provides a great example of how a toy regex parser works and an excellent explanation of why vanilla regex tends to be slow. It also presents a novel solution: compiling the regex into regular code, which can then be optimized by the compiler.

▲

convolvatron

16 hours ago

[-]

this is literally how 'lex' works. the one written in 1987 by Vern Paxson.

▲

jlokier

14 hours ago

[-]

The original is 'lex', written in 1975 by Mike Lesk and Eric Schmidt.

Yes, that Eric Schmidt, CEO of Google.

1987 was the clone, 'flex' :-)

It did "compiling the regex into regular code, which can then be optimized by the compiler" before the C programming language as we know it was created. I think 'lex' was compiling regex to C before the C language even had 'struct' types, 'printf' or 'malloc'.