Linux Internals: How /proc/self/mem writes to unwritable memory (2021)
80 points
10 hours ago
| 6 comments
| offlinemark.com
| HN
hansendc
9 hours ago
[-]
"On x86-64, there are two CPU settings which control the kernel’s ability to access memory."

There are a couple more than two, even in 2021.

Memory Protection Keys come to mind, as do the NPT/EPT tables when virtualization is in play. SEV and SGX also have their own ways of preventing the kernel from writing to memory. The CPU also has range registers that protect certain special physical address ranges, like the TDX module's range. You can't write there either.

That's all that comes to mind at the moment. It's definitely a fun question!

reply
karlgkk
4 hours ago
[-]
a thought: do MPK actually control the kernel's ability to access memory? on intel, i think if you try to read that memory, a page fault wont be thrown. although with PKS, kernel reads will cause a page fault.

so can the kernel (ring0) freely read/write to memory encrypted with MPK? I think so, yes. good luck with whatever happens next tho lol

reply
aliceryhl
2 hours ago
[-]
Interesting. Though looking at the code, it does still check VM_MAYWRITE, so the mapping needs to be something you could remap as writable.
reply
anthk
35 minutes ago
[-]
/proc it's a bad imitation of plan9's /proc.
reply
KenoFischer
6 hours ago
[-]
I'm still surprised I was the first one to notice when Linus tried to change this - I always thought it was a pretty well known behavior.
reply
bluepeter
9 hours ago
[-]
The kernel owns the page tables. It can always find another way in.
reply
vlovich123
4 hours ago
[-]
But the point here is that userspace can use this to bypass kernel protections that would otherwise prevent it from mutating R^X pages for example, not that the kernel can bypass its own.
reply
pjmlp
1 hour ago
[-]
Not really, of the security measures on Windows, is exactly to control how kernel can access secure process memory, as possible mitigation to attacks by rogue drivers.

Naturally it is the kind of stuff that requires Windows 11 vlatest with the nice Pluton security CPU, as part of CoPilot+ PCs design.

reply
mschuster91
9 hours ago
[-]
> The kernel owns the page tables.

not entirely, IOMMU is a thing, that is IIRC how Amazon and other hyperscalers can promise you virtual machines whose memory cannot be touched even in the case the host is compromised (and, by extension, also if the feds arrive to v& your server).

reply
gruez
6 hours ago
[-]
>how Amazon and other hyperscalers can promise you virtual machines whose memory cannot be touched even in the case the host is compromised (and, by extension, also if the feds arrive to v& your server).

Even if we take those promises at face value, it practically doesn't mean much because every server still needs to handle reboots, which is when they can inject their evil code.

reply
Borealid
5 hours ago
[-]
MK-TME allows having memory encrypted at run time, and the platform TPM signs an attestation saying the memory was not altered.

Malicious code can't be injected at boot without breaking that TPM.

reply
fc417fc802
5 hours ago
[-]
Subject to the huge caveat that the attacker does not have physical access. https://tee.fail/
reply
Borealid
2 hours ago
[-]
An interesting implementation flaw, but not a conceptual problem with the design.
reply
fc417fc802
1 hour ago
[-]
Well, it kind of is actually. The previous iteration of the design didn't have that vulnerability but it was slower because managing IVs within the given constraints adds an additional layer of complexity. This is the pragmatic compromise so to speak.

Does it count as a conceptual problem when technical challenges without an acceptable solution block your goal?

reply
ronsor
7 hours ago
[-]
If your threat model is being v& by feds, maybe you should keep your server at home behind Tor.
reply
haberman
7 hours ago
[-]
TL;DR: when a user writes to /proc/self/mem, the kernel bypasses the MMU and hardware address translation, opting to emulate it in software (including emulated page faults!), which allows it to disregard any memory protection that is currently setup in the page tables.
reply
IAmLiterallyAB
4 hours ago
[-]
It doesn't bypass it exactly, it's still accessing it via virtual memory and the page tables. It's just that the kernel maintains one big linear memory map of RAM that's writable.
reply
rramadass
6 hours ago
[-]
Thank You.
reply