Closures as Win32 Window Procedures
70 points
10 hours ago
| 3 comments
| nullprogram.com
| HN
RossBencina
7 hours ago
[-]
> This is more work than going through GWLP_USERDATA

Indeed, aside from a party trick, why build an executable trampoline at runtime when you can store and retrieve the context, or a pointer to the context, with SetWindowLong() / GetWindowLong() [1]?

Slightly related: in my view Win32 windows are a faithful implementation of the Actor Model. The window proc of a window is mutable, it represents the current behavior, and can be changed in response to any received message. While I haven't personally seen this used in Win32 programs it is a powerful feature as it allows for implementing interaction state machines in a very natural way (the same way that Miro Samek promotes in his book.)

[1] https://learn.microsoft.com/en-us/windows/win32/api/winuser/...

reply
ack_complete
5 hours ago
[-]
There's an annoying corner case when using SetWindowLongPtr/GetWindowLongPtr() -- Windows sends WM_GETMINMAXINFO before WM_NCCREATE. This can be worked around with a thread local, but a trampoline inherently handles it. Trampolines are also useful for other Win32 user functions that don't have an easy way to store context data, such as SetWindowsHookEx(). They're also slightly faster, though GetWindowLongPtr() at least seems able to avoid a syscall.

The code as written, though, is missing a call to FlushInstructionCache() and might not work in processes that prohibit dynamic code generation. An alternative is to just pregenerate an array of trampolines in a code segment, each referencing a mutable pointer in a parallel array in the data segment. These can be generated straightforwardly with a little template magic. This adds size to the executable unlike an empty RWX segment, but doesn't run afoul of any dynamic codegen restrictions or require I-cache flushing. The number of trampolines must be predetermined, but the RWX segment has the same limitation.

reply
rovingeye
2 hours ago
[-]
I wasn't aware of the thread local trick, I solve this problem by not setting WS_VISIBLE and calling SetWindowPos & ShowWindow after CreateWindow returns (this solves some other problems as well..)
reply
timokr
1 hour ago
[-]
The combination `GWLP_USERDATA` to pass state and `GWLP_WNDPROC` to update the actual wnd procedure is what I used in my [rust wrapper](https://github.com/timokroeger/winmsg-executor/blob/main/src...).

This two step approach is the only way I found to use rust closures for wndproc without double allocation and additional indirection.

reply
cyberax
8 hours ago
[-]
This approach was used in the ATL/WTL (Active Template Library, Windows Template Library) in the early 2000-s. It was a bad idea, because you need to generate executable code, interfering with NX-bit memory protection.

Windows actually had a workaround in its NX-bit implementation that recognized the byte patterns of these trampolines from the fault handler: https://web.archive.org/web/20090123222148/http://support.mi...

reply
barrkel
43 minutes ago
[-]
It was also used by Delphi in 90s.
reply
kmeisthax
4 hours ago
[-]
I'm genuinely surprised Microsoft's attitude towards "wndprocs don't have a context pointer" was "let's JIT compile a trampoline to hold the context pointer" and not to add support for a five-parameter wndproc into USER.dll, or have a wrapper that grabs GWLP_USERDATA and copies it to the register this lives in.
reply
pjc50
3 hours ago
[-]
Doesn't x32 only have four registers available in the calling convention, AX-DX?
reply
ack_complete
1 hour ago
[-]
The stdcall calling convention used APIs and API callbacks on Windows x86 doesn't use registers at all, all parameters are passed on the stack. MSVC does support thiscall/fastcall/vectorcall conventions that pass some values in registers, but the system APIs and COM interfaces all use stdcall.

Windows x64 and ARM64 do use register passing, with 4 registers for x64 (rcx/rdx/r8/r9) and 8 registers for ARM64 (x0-x7). Passing an additional parameter on the stack would be cheap compared to the workarounds that everyone has to do now.

reply
pwdisswordfishy
1 hour ago
[-]
The x32 ABI uses the same fastcall convention as the regular x86-64 ABI. It's mostly syscall numbers that are affected by the shrunk pointers.
reply
Philpax
8 hours ago
[-]
Hah! I usually allocate trampolines at runtime, as the article suggests, but reserving R/W space for them within the application's memory space is a cute trick.

Probably not useful for most of my use cases (I'm usually injecting a payload, so I'd still have the pointer-distance issue between the executable and my payload), but it's still potentially handy. Will have to keep that around!

reply