Gathering Linux Syscall Numbers in a C Table
52 points
4 days ago
| 5 comments
| t-cadet.github.io
| HN
halb
3 hours ago
[-]
There is an existing project that tracks and gather syscalls in the linux kernel, for all ABIs: https://github.com/mebeim/systrack . The author maintains a table here, which is incredibly useful: https://syscalls.mebeim.net/?table=x86/64/x64/latest
reply
rwmj
3 hours ago
[-]
reply
jmgao
2 hours ago
[-]
> I was expecting a unified interface across all architectures, with perhaps one or two architecture-specific syscalls to access architecture-specific capabilities; but Linux syscalls are more like Swiss cheese.

There's lots of historical weirdness, mostly around stuff where the kernel went "oops, we need 64-bit time_t or off_t or whatever" and added, for example, getdents64 to old platforms, but new platforms never got the broken 32-bit version. There are some more interesting cases, though, like how until fairly recently (i.e. about a decade ago for the mainline kernel), on x86 (and maybe other platforms?) there weren't individual syscalls for each socket syscall, they were all multiplexed through socketcall.

reply
pjmlp
3 hours ago
[-]
> In an ideal world, there would be a header-only C library provided by the Linux kernel; we would include that file and be done with it. As it turns out, there is no such file, and interfacing with syscalls is complicated.

Because Linux is the exception, UNIX public API is the C library as defined later by POSIX.

The goal to create C and rewrite UNIX V4 into C was exactly to move away from this kind of platform details.

Also UNIX can be seen as C's runtime, in a way, thus traditionally the C compiler was the same of the platform vendor, there were not pick and chose among C compilers and standard libraries, that was left for non-UNIX platforms.

reply
wahern
2 hours ago
[-]
All true, but note that BSD introduced, and both Linux/glibc and Linux/musl support, a syscall(2) wrapper routine that takes a syscall number, a list of arguments (usually as long's), and performs the syscall magic. The syscall numbers are defined as macros beginning with SYS_. The Linux kernel headers export syscall numbers with macros using the prefix __NR_, but to match the BSD interface Linux libc headers usually translate or otherwise define them using a SYS_ prefix. Using the macros is much better because the numbers often vary by architecture for the same syscall.

See https://man7.org/linux/man-pages/man2/syscall.2.html

reply
pjmlp
2 hours ago
[-]
Except with BSDs you are on your own if you go down that route, because there are no stability guarantees.

It is more of an implementation detail for the rest of the C APIs than anything else.

reply
wahern
2 hours ago
[-]
Indeed. Another reason to use the system's macros rather than hardcoding integer literals--the numbers can change between releases. Though that doesn't guarantee the syscall works the same way between releases wrt parameters and return value semantics, if it still exists at all. And I believe OpenBSD removed the syscall wrapper altogether after implementing the pinsyscalls feature.
reply
muvlon
2 hours ago
[-]
I actually love this about Linux. The syscall API is much better than libc (both the one defined by POSIX and libc as it actually exists on different Unixen). No errno (which requires weird and inefficient TLS bullshit), no hooks like atfork/atexit/etc., no locales, no silly non-reentrant functions using global variables, no dlopen/dlclose. Just the actual OS interface. Languages that aren't as braindead as C can have their own wrappers around this and skip all that nonsense.

Also, there are syscalls which are basically not possible to directly expose as C functions, because they mess with things that the C runtime considers invariant. An example would be `SYS_clone3`. This is an immensely useful syscall, and glibc uses it for spawning threads these days. But it cannot be called directly from C, you need platform-specific assembly code around it.

reply
oguz-ismail2
1 hour ago
[-]
> But it cannot be called directly from C

No system call can, you need a wrapper like syscall() provided by glibc. glibc also provides a dedicated wrapper for the clone system call which properly sets up the return address for the child thread. No idea what you're angry about

reply
muvlon
1 hour ago
[-]
Sure, you need a tiny bit of asm to do the actual syscall. That's not what I'm talking about. Most syscalls are easy to wrap, clone is slightly harder but doable (as evidenced by glibc). clone3 is for all intents and purposes impossible to write a general C wrapper for. It allows you to create situations such as threads that share virtual memory but not file descriptors, or vice-versa. That is, it can leave the caller in a situation that violates core assumptions by libc.
reply
oguz-ismail2
57 minutes ago
[-]
You're mixing things up. C the language doesn't know about virtual memory or file descriptions. Those are OS features.
reply
adrian_b
34 minutes ago
[-]
The C library maintains its own set of file descriptors, which are mapped to the OS file descriptors (because the stdio file descriptors and the OS file descriptors have different types and different behaviors).

I do not know whether this is true, but perhaps the previous poster means that using clone3 with certain arguments may break this file descriptor mapping so invoking after that stdio functions may have unexpected results.

Also the state kept by the libc malloc may get confused after certain invocations of clone3, because it has memory pages that have been obtained through mmap or sbrk and which may sometimes be returned to the OS.

So libc certainly cares about the OS file descriptors and virtual memory mappings, because it maintains its own internal state, which has references to the corresponding OS state. I have not looked to see when an incorrect state can result after a clone3, but it is plausible that such cases may exist, so that glibc allows calling clone3 only with a restricted combination of arguments and it does not provide a wrapper that would allow other combinations of arguments.

reply
pm215
1 minute ago
[-]
Yes; this is why QEMU's user-space-emulation clone syscall handling restricts the caller to only those combinations of clone flags which match either "looks like fork()" or "looks like creating a new pthread", because QEMU itself is linked with the host libc and weird clone flag combinations will put the new process/thread into a state the libc isn't expecting.
reply
adrian_b
2 hours ago
[-]
Even if one would want to use Linux only through libc, that is not always possible.

Linux has evolved beyond POSIX and many newer syscalls, which can enhance performance in certain scenarios, are not available as libc functions.

They may be invoked either using the generic syscall wrappers provided by glibc besides the standard functions, or by using custom wrappers or possibly by using some special libraries, if such libraries are available.

reply
pjmlp
2 hours ago
[-]
That isn't a valid reason, given the existence of Solaris, HP-UX, DG/UX, Tru64, NeXTSTEP, and so many other UNIXes that grew beyond AT&T UNIX System V.

All of them provide C APIs to their additional features not covered by POSIX.

What Linux has is that due to the way syscalls are exposed there is a certain laziness to cover everything on glibc, or its replacements like musl.

reply
pm215
2 hours ago
[-]
I think rather than "laziness" I would say it's an instance of the widespread phenomenon of "shipping the orgchart". Because for Linux the kernel developers and the libc developers are separate communities, the boundary between those components becomes more meaningful, more visible to the end-user, and more likely to have lag where one side supports something and the other doesn't yet. (That goes both ways, incidentally -- the handling of POSIX threads was driven more from the libc side of the fence and it took a while before the kernel provided enough facilities to make it cleanly doable, and there are still corners like setuid() where there is a mismatch between what the standard wants and the primitives the kernel provides). Where an OS has a more tightly integrated development team the kernel/libc boundary is more likely to stay an internal one.
reply
cb321
22 minutes ago
[-]
This description matches my own experience. I recall having to use my own macro-based syscall() things when the inotify system was first introduced because glibc did not have support for years and then it was years more for slow moving Linux distros to pick up the new glibc version.

Unsaid was that much of this project separation comes from glibc being born as (and probably still being) a "portable libc with extra GNU-ish features", not a Linux-specific thing.

Honesty, some of this pain might have been avoided had the Bell Labs guys made two libraries - the syscall interface part of `libc`, called say `libos`, and the more articulated language run-time (string/buffered IO/etc./etc) the actual `libc`. Then the kernel could "easily" ship with libos and libc's could vary. To even realize this might be helpful someday likely required foresight beyond reason in the mid-1970s. Then, afterwards, Makefile's and other build system stuff probably wanted to stay with "-lc" in various places and then glibc/others wanted to support that and so it goes. Integration can be hard to un-do.

reply
masklinn
2 hours ago
[-]
IIRC it’s not just laziness, there are things glibc explicitly doesn’t want to expose for various reasons, and since the two projects are essentially unrelated you get the intersection of what both sides are happy with.

Traditional unices develop the kernel and the libc together, as a system, so any kernel feature they want to expose they can just do so.

reply
adrian_b
2 hours ago
[-]
You are right, but I did not comment on what might be desirable, but on what is the current status.

Because I do not like certain decisions in the design of glibc, I am skeptical about their ability do define good standard APIs for the more recent syscalls, so perhaps it is better that they did not attempt to do this.

reply
leni536
2 hours ago
[-]
> In an ideal world, there would be a header-only C library provided by the Linux kernel; we would include that file and be done with it. As it turns out, there is no such file, and interfacing with syscalls is complicated.

Isn't that nolibc.h?

reply
adrian_b
2 hours ago
[-]
Nolibc, which is incorporated in the Linux kernel sources (under "tools"), contains generic wrappers for the Linux syscalls and a set of simplified implementations for the most frequently needed libc functions.

It is useful for very small executables or for some embedded applications.

It is not useful for someone who would want to use the Linux syscalls directly from another programming language than C, bypassing glibc or other libc implementations, except by providing models of generic wrappers for the Linux syscalls.

It also does not satisfy the requirement of the parent article, because it does not contain a table of syscalls that could be used for separate compilations.

Nolibc implements its functions like this:

  long sys_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg)
  {
   return my_syscall3(__NR_ioctl, fd, cmd, arg);
  }
where the syscall numbers like "__NR_ioctl" are collected from the Linux kernel headers, because nolibc is compiled in the kernel source tree.

As explained in the parent article, there is no unique "__NR_ioctl" in the kernel sources. The C headers that are active during each compilation are selected or generated automatically based on the target architecture and on a few other configuration options, so searching for the applicable "__NR_ioctl" can be tedious, hence the value of the parent article and of a couple of other places mentioned by other posters, where syscall tables can be found.

reply
jesse__
4 hours ago
[-]
I've been thinking about doing this for a little side project for some time. Looking forward to the eventual conclusion :)
reply